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ABSTRACT 


At the center of software reuse is the search and retrieval of software components 
from large software libraries. Recent research has illuminated a promising approach 
called multi-level filtering that breaks the problem up into a series of increasingly 
stringent filters that move along a continuum of high-recall, low-precision syntactic 
techniques towards the more computationally expensive, high-precision semantic 
techniques. 

In multi-level filtering, syntactic matching is decomposed into two phases: profile 
filtering and signature matching. This thesis presents improvements to the resolution of 
syntactic profiles where the intent is to increase precision without a loss in recall during 
profile filtering. Large integer representation of profiles and profile lookup tables lead to 
an optimal time-and-space solution to profile representation. Finally, a new approach to 
signature matching 1s proposed that provides early pruning of the search-space in an 
effort to cut down the time it takes to find valid signature maps. 

The resulting software is mature enough for future integration with the other 


elements of multi-level filtering as well as inclusion in a CASE tool such as CAPS. 
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I. INTRODUCTION 


Effective software reuse becomes increasingly more important as the cost and 
complexity of software development escalates. At the center of the issue is the search 
and retrieval of software components from large software libraries. When enterprises that 
encourage the creation of reusable software components succeed in their efforts they are 
often met with the discouraging reality that large software bases are difficult to use. 
Issues such as query formulation, component storage, component retrieval, and 
presentation of query results must all be addressed with the same technology/usability 
tradeoffs that accompany most tools. Searching and retrieving components in large 
software bases has typically been plagued by poor recall and precision, slow algorithms, 
and demanding query requirements. 

In an effort to address such shortcomings, the literature has shed light on 
numerous techniques for searching and retrieving components in large software bases but 
they usually fall short due to their narrow approaches. A promising new hybrid approach 
called multi-level filtering combines many of the traditional aspects of search and 
retrieval, such as keyword matching, syntactic matching, and semantic matching. The 
method breaks the problem up into a series of increasingly stringent filters that move 
along a continuum of high-recall, low-precision syntactic techniques towards the more 
computationally expensive, high-precision semantic techniques. This thesis is focused on 
improving the syntactic matching filters used early in the process of multi-level filtering. 

To begin, section II reviews the relevant literature leading up to multi-level 
filtering. Section III more specifically discusses the architecture of multi-level filtering, 
including the decomposition of syntactic matching into its two phases of profile filtering 
and signature matching. Section IV presents improvements to the resolution of syntactic 
profiles’ that can increase precision without a loss in recall during profile filtering. Also 
presented are improvements to the internal representation of syntactic profiles that lead to 
an optimal time-and-space solution to profile representation. Section V_ outlines 


improvements to signature matching that provide early pruning of the search-space in an 


' Unique to the multi-level filtering method, a syntactic profile is a normalized representation of a software 
component’s syntactic properties. A more detailed definition is found in section IV. 


effort to cut down the time it takes to find valid signature maps. Section VI discusses the 
effectiveness of the improvements through a series of experiments. Section VII draws 
some conclusions and suggests areas for future research. The last sub-section in sections 
IV and V contain a detailed design of the improvements and the appendix contains the 
source code representing the design’s implementation. The resulting software is mature 
enough for future integration with the other elements of multi-level filtering as well as 


inclusion in a CASE tool such as CAPS. 


HW. BACKGROUND 


A sampling of previous work in software component search and retrieval is 
presented in this section to provide some background and basis for the ideas proposed in 


this thesis. 


i KEYWORD MATCHING 


The classical and somewhat popular approach to software search and retrieval has 
been the employment of keyword matching. Components are assigned keywords that 
describe their attributes and functionality. Queries are specified with keywords and a 
simple search through the software base for components with matching keywords returns 
the candidate set of components. Such an approach breaks down, however, as the size of 
the software base increases. A large set of keywords can cause loss of recall and small 
sets of keywords can cause loss of precision. 

[11] improves on the classical keyword technique by utilizing a faceted approach 
that better structures the terms used for classifying the components. Terms chosen from a 
set of facets are used to categorize all the components. This facilitates a closer fit of 
terms and reduces the problem of deciding the best keyword to use from a fixed set of 
standard keywords. 

Among the problems with keyword-based approaches is the inherent requirement 
of a well-versed librarian. The infamous garbage-in/garbage-out principle certainly 
applies to the software base population activity. Ifthe librarian does not have appropriate 
domain knowledge for each component admitted into the software base then the 
keywords will not be chosen correctly and penalties in recall and precision during search 
and retrieval will ensue. 

A long overdue use of keyword matching is to apply it along side other 
techniques. The multi-level filtering method in [9] is an example of such a hybrid 
approach. The results of keyword matching are summarized in a computed keyword ratio 


that can be used to determine if a candidate should be forwarded to the next filter. If 


problems with recall and precision emerge, the keyword filter threshold can be adjusted 


or the keyword filter can be deactivated altogether. 


B. SYNTACTIC MATCHING 


Syntactic matching has been proposed as an effective method for quickly ruling 
out components that cannot match the query [13]. The process can be successfully 
automated when syntactic normalizing procedures are applied. Syntactic normalization 
procedures come in many forms [2][6][13] but perhaps the most promising approach 
proposed recently is the application of syntactic profiles [9]. This approach is discussed 

and improved upon in section IV of this thesis. 
. The presence of subtypes in queries and components has often plagued syntactic 
matching by imposing penalties in recall. For example, if the query is an operation that 
takes a positive as an input and the operation components in the software base only 
contain integer inputs then the query will fail even though positives are legitimate 


subtypes of integers. Such a shortcoming is addressed in [2] and further refined ini [9]. 


Cc; SEMANTIC MATCHING 


A major shortcoming of syntactic matching 1s its inability to retrieve components 
based on their behavior. If syntactic matching were the sole approach to search and 
retrieval a query for a square-root function would indeed return a square-root function 
but, to the user’s dismay, most of the other math functions in the software base would be 
returned as well! Recent efforts have attempted to address this shortcoming through 
various approaches to semantic matching. Specification-based approaches found in [6] 
and [8] require the user to form queries as behavioral specifications but haven’t been met 
with great success due to the difficulty of forming correct specifications. 

The approach of using algebraic specifications [13] for encoding a component’s 
behavior has led to promising results for successfully automating semantic matching [9]. 


A set of ground equations describing the component’s behavior can be specified 


algebraically using algebraic specification languages such as OBJ3 [3] and included with 
the component. The terms in the equations can be applied from left to right to simplify 
them to their canonical form where they can then be easily compared to a query’s set of 
ground equations. Algebraic specifications, however, are not much of an improvement 
with regards to ease of use. Specification languages such as OBJ3 have to be absorbed 
by the librarian and the user and the domain of the component needs to be understood. A 
librarian will be met with a cumbersome task when preparing an entire software base for 


this type of semantic matching [10]. 


D. MULTI-LEVEL FILTERING 


Multi-level filtering [9] 1s an approach that integrates keyword, syntactic, and 
semantic matching. It is attractive because it applies a series of increasingly stringent 
filters that move along a continuum of high-recall, low-precision syntactic techniques 
towards the more computationally expensive, high-precision semantic techniques. The 
purpose of the work described in this thesis is to improve upon the syntactic matching 
processes of multi-level filtering. Hence, a discussion regarding the specifics of the 
multi-level filtering approach will be postponed to their relevant sections of this 


document. 


E. SOFTWARE BASE DESIGN AND POPULATION 


Populating the software base usually involves annotating the components with 
additional information to facilitate search and retrieval. In every approach cited above 
this is the case. PSDL [5] has been shown to be an effective language for representing 
components independently of their native language [10]. In addition to its real-time 
specification support, PSDL supports operations (including generic operations), abstract 
data types (including generic types), state machines, and the common predefined types 
found in most popular programming languages. Thus PSDL is more than sufficient for 


representing the syntactical properties of queries and reusable components in a software 


base. PSDL also provides a placeholder for axioms to provide semantic information for 
the component. Algebraically specified ground equations in the form of OBJ3, for 
instance, can be placed in this section of the PSDL file. 

CAPS [7], a CASE tool for rapid prototyping of embedded hard real-time 
systems, represents great strides in integrating modern software engineering technologies. 
The system includes a graphical editor, an execution support system, an evolution control 
system, automated real-time schedulers, automated integration of Ada modules, and 
placeholders for making use of a software base. Its initial software base [10] includes 
reusable components from the Booch library. The components include syntactic 
specifications in PSDL and semantic specifications in OBJ3 thereby providing a good test 


suite for multi-level filtering and the ideas proposed in this thesis. 


I. MULTI-LEVEL FILTERING ARCHITECTURE 


The model of multi-level filtering is illustrated in Figure 1. The entire process can 
be generalized into two main activities: syntactic matching and semantic matching. 
Syntactic matching quickly filters out candidates based on syntactic properties to 
eliminate as many candidates as possible that must undergo the computationally 
expensive semantic matching. Clearly it is advantageous to filter out large numbers of 
candidates early to minimize the use of the more laborious filters later in the process. At 
any stage of the process the user should be able to set the thresholds that determine the 
constraints within which a candidate may pass. Furthermore, the user should be able to 
browse the set of candidates from the prior filters and have the option of manually 


filtering the results that are passed to the next filter. 


Syntactic Matching Semantic Matching 
Profile Signature Ground 
Filtering . Matching / ie Sauaton 7 
; Checking 


Figure 1: Multi-level Filtering Model 
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This thesis focuses on improving syntactic matching by making improvements to 
profile filtering and signature matching. Section IV discusses improvements to profile 
filtering and section V discusses improvements to signature matching. In addition to the 
presentation of theoretical improvements, each section also covers a detailed design and 
implementation for realizing a software module that can be practically used within the 


entire context of multi-level filtering and ultimately in CAPS. 





IV. PROFILE FILTERING 


A. CURRENT STATE OF THE ART 


In [9] a component’s syntactic properties are represented as a Component Profile. 
A component profile is the multiset of Operation Profiles for all the operations in a 
component. An operation profile is a sequence of integers each representing a unique 


syntactic property’ of an operation. Definition 4 in [9] defines an operation profile as: 


1. The first integer is the total number of occurrences of sorts. 
2. Ifthe total number of sort groups, N, is greater than 0, then the second to (1 + 
N)" integers are the cardinalities of the sort groups, in descending order. 
The (2+N)" integer is the cardinality of the unrelated sort group. 
4. The (3+N)" integer is: 
0 if the value sort is different from any of the argument sorts; and 


1 if the value sort belongs to some sort group. 


By computing the component profiles for each reusable component in the software base, 
components can be placed into partitions where each partition is identified by the 
component profile of the components it contains. An ordering of these partitions can then 
be obtained to organize the software base into a haase-diagram for facilitated traversal 
during a process [9] defines as Profile Filtering. 

Profile filtering is a process in which components in the software base can be 
easily ruled out based on whether their syntactic profiles match the query’s syntactic 
profile. This is a high-speed (relative to signature and semantic matching) process where 


the goal is to increase precision in a typically high-recall/low-precision stage of retrieval. 


* These properties have been referred to as profile components but we will use the term property rather than 
component to eliminate an overloading of the term component which we have been using to refer to a 
reusable component such as a type. 


B. PROFILE IMPROVEMENTS 


One way to increase precision in [9]’s approach to profile filtering is to make 
improvements to the definition of an operation profile. Two categories of improvement 


that can be easily quantified are Resolution and Space-and-Time. 


fe Resolution 


The point of increasing the resolution of syntactic profiles is to better distinguish 
between syntactically similar software components. In terms of [9]’s architecture this 
would result in an increase in the number of partitions in the software base. In terms of 
[9]’s profile filtering process this would mean an increase in the number of nodes in the 
haase-diagram that maps the software base’s organization. 


Gains in resolution can be obtained two ways: 


1. Add more properties to the profile. 


2. Use properties that can be measured with more possible values. 


In keeping with the spirit of syntactic normalization, however, one has to be careful to 
define measurements that will not be affected by the permutation of the arguments or by 
any renaming of the types. 

[1] inspired several resolution improvements to profiles that can prove quite 
useful in partitioning the software base more effectively. The first improvement follows 
the second resolution-gain technique described above and the other improvements 


subscribe to the first technique. 
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a. Value Sort Frequency 


Item 4 of [9]’s operation profile definition has two possible values, 0 or 1, 
indicating if the value sort’ is in the same sort group as other arguments in the operation 
or if it is a member of the unrelated sort group. The resolution of this particular property 
can be increased by modifying its definition to be the number of occurrences of the value 


sort in the operation’s signature. Table 1 illustrates this improvement: 


Table 1 


Component Operations 


add: id set — set 
union: set set —> set 


member: id set > bool 
choose: set —> id 





The increase in resolution is well illustrated with the operations add and 
union. When using the old definition we notice that add and union have the same value 
for the value-sort property. When using the new definition we see that add and union 
each have different measurements for this property. Such a difference guarantees that 
add and union will have different profiles and therefore contributes to an increased 


resolution of the software base. 


b. Type Sort Frequency 


A majority of the reusable components the author has come across have 
been abstract data types. In most cases these types refer to themselves in the operations 
they define. For instance, in Table 1 the component is a sef and one will notice the 
operations refer to set frequently. The frequency of such self-references can be measured 


and can contribute to the component’s profile. Table 2 illustrates the additional property: 


* The term value sort is used by [9] to refer to the type of the output argument of an operation. In this 
thesis the terms value and output are used interchangeably but an effort to use value when referring to 
concepts in [9] will be made. 


1] 


Table 2 


Component Operations Pe lle 


add: id bag — bag 
merge: bag bag —> bag 


equal: bag bag —> bool 

equal with_set: bag set > bool 
member: id bag — bool 

freq: id bag — natural 





The new property measures the number of times bag is referred to in the 
operation’s signature. Notice that equal and equal with set are assigned different values 
for this new property. In the old profile definition, these two operations would have the 


same profile. Again, we have an improvement in resolution. 


C. Predefined Sort Frequencies 


The final resolution improvement to introduce involves representing the 
sizes of the various sort groups for the predefined’ types. [9] and [2] both note that during 
signature matching the predefined types can only map to predefined types of the same 
sort group.’ Given this requirement, it would be beneficial to filter out components 
during profile filtering that would violate such a requirement. Hence we can add an 
integer for each predefined sort group that would reflect the size of that sort group in the 
operation’s signature. In Table 3, five predefined sort groups are recognized in the 


following order: boolean, character, string, integer, and real. 


Table 3 


Component Operations ie sola 
add: id bag —> bag 
merge: bag bag —> bag 


member: id bag — bool 
freq: id bag — natural 





The operations member and freq are good examples of the increased 


resolution this improvement provides. The old profile definition would assign these two 


* [9] refers to predefined types as basic types. The two terms are used interchangeably in this thesis. 
> [2] further restricts this statement with rules regarding subtype matching within the sort group. This is 
addressed in the section on Signature Matching where the discussion is more applicable. 


He 


operations the same profile. The enhancement assigns different profiles thereby 
increasing the resolution and eliminating the signature-matching algorithm from trying to 


map incompatible predefined types. 


Ds Time-and-Space 


Software bases can become enormous rather quickly. A large enterprise’s 
software base can contain thousands of reusable components. Representing such a large 
software base in the architecture proposed by [9] can tax the resources of the enterprise’s 
computer/s responsible for maintaining and searching the software base. To this end, the 
representation of syntactic profiles is an issue worth special attention since thousands of 
components can actually translate into tens of thousands of operations! 

[9] suggests the operation profile be represented as a sequence of integers. This 
requires a sequence abstract data type with standard operations defined such as equality 
and less-than (for sorting). Numerous instantiations of such an abstract data type could 
require a substantial amount of memory. Two possible suggestions for making time-and- 


Space improvements to syntactic profile representation are explained below. 


a. Large Integer Representation 


A representation that would take up less space would be a large integer of 
something like 64 bits. Each digit in the integer would represent each integer in the 
profile. Besides space, speed issues regarding the testing for equality and less-than would 
be greatly sped up because the default operations for the integer would apply, thereby 
eliminating the need for putting a user-defined function on the stack each time these 
common operations are called. 

The biggest disadvantage to this approach should be evident: such a 
representation would limit the number of sort occurrences in the signature to nine. A 
function with ten sort occurrences is rather rare, however. One could use two digits for 
each property thereby potentially relaxing the restriction to 99 sort occurrences, which 1s 


definitely enough. Two digits per property, however, would require a much larger integer 


13 


than one that could be represented with 64 bits and it is questionable if any high-level 
language can efficiently represent greater-than-64-bit integers any more efficiently than a 


smart implementation of a sequence. 


b. Profile Lookup Table 


A component profile is traditionally thought of as a sequence of operation 
profiles. In other words, it is a sequence of sequences of integers. Given thousands of 
components, this can take up a lot of space and can tax the component profile equality 
operations. Especially wasteful is the fact that the number of unique operation profiles is 
much smaller than the actual number of components in the software base. 

A promising approach for improving the time-and-space issues of 
component profiles is the employment of a profile lookup table. To eliminate the 
redundancy of integer sequences that represent operation profiles throughout the software 
base, this table would map a unique integer to each unique operation profile used in the 
software base. A component profile can then be represented as sequence of these unique 
integers rather than a sequence of integer sequences. Below is an example to illustrate 


the concept: 


Table 4: Profile Lookup Table 


Lookup ID Operation Profile 
[2,1,2,1,0,0,0,0,0] 
[3,1,3,1,0,0,0,1,0] 


Ss l 5 l ? l ,0,0,0,0] 
Ie. 1,2,0,0,0,0,0,2] 
[3,3,0,3,0,0,0,0,0,3] 





Table 5: Set 


Component Operations Operation Profiles Lookup ID 























add: id set —> set [3,2,1,2,0,0,0,0,0,2] 4 
union: set set — set [3,3,0,3,0,0,0,0,0,3] 5 
member: id set — bool [3,1,3,1,1,0,0,0,0] 3 
choose: set — id [2,1,2,1,0,0,0,0,0] l 
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Table 6: Bag 


Component Operations Operation Profiles Lookup ID 






















add: id bag — bag [3,2,1,2,0,0,0,0,0,2] 4 
merge: bag bag — bag [3,3,0,3,0,0,0,0,0,3 ] > 
member: id bag > bool [3,1,3,1,1,0,0,0,0] 3 
freq: id bag — natural {3,1,3,1,0,0,0,1,0] 2 


Table 4 depicts the profile lookup table after the set and bag components 
from Table 5 and Table 6 have been loaded. The first thing to note from this example is 
the redundancy in operation profiles between set and bag. Three out of the five unique 
operation profiles in the lookup table are shared between set and bag. The second thing 
to note is the huge space savings gained for a component profile. Without the lookup 
table the set’s component profile would be [[3,2,1,2,0,0,0,0,0,2], [3,3,0,3,0,0,0,0,0,3], 
[3,1,3,1,1,0,0,0,0], [2,1,2,1,0,0,0,0,0]]. By using the lookup table set's component profile 
can be represented as (4,5,3,1].° Given thousands of components the amount of space 
saved is significant. Furthermore, the amount of time saved checking for component 
profile equality can be substantial since the number of actual integer comparisons is cut 
drastically. 

The profile lookup table represents an optimal time-and-space solution to 
profile filtering. During profile filtering, the actual profiles themselves are irrelevant. 
What is relevant is whether two profiles are the same. The profile lookup table ensures 
that each profile is represented by a unique identifier. Since this identifier can be 
represented by an integer we have an optimal time-and-space solution to profile 


representation. 


c. DESIGN AND IMPLEMENTATION 


The software used in [9] is not very conducive to reusability and extendibility and 
therefore is difficult to use for testing the improvements in syntactic matching outlined in 
this thesis. Furthermore, it is desirable to have a software module that is practical for 


inclusion in CAPS. To this end, a significant amount of design and implementation 1s 
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necessary. This section details the various data types and implementation strategies used 
to implement a practical system to test the improvements proposed in this thesis with the 
understanding that such a system should ultimately integrate with other elements of 


multi-level filtering and CAPS in the large. 


t Component Types 


[13] proposed components ultimately be stored in an object-oriented database to 
easily associate the various elements of a software component required for search and 
retrieval. This idea is highly appropriate for a production quality implementation of a 
software base but given the lack of engineering resources at this stage of the research 
such an idea has not yet come to fruition. The CAPS software base is currently 
composed of a set of files for each component where each file for the component 
represents a different element of the component that is useful for reuse [10]. Specifically 
this includes the component’s native language (e.g. Ada) specification, native language 
body, PSDL specification, and OBJ3 specification. 

The first task, then, 1s to organize these files into an intelligent scheme to support 
the goals of this thesis and the short-term goals of the CAPS project. The organization 
proposed here is to create a directory for each component that contains all of its files. 


Figure 2 illustrates examples of these directories. 


ICAPS/sb/set/ ICAPS/sb/map/ 
VAIN. 


set.pso 
set_s.a 


set_b.a 
set.obj3 





Figure 2: Sample Directories for a Software Base 


A header file is used to identify all of the components that comprise the software base. 


An example of such a header file is shown in Figure 3. Notice that a unique integer is 





° The component profile in this example is not ordered but could be for improved signature matching. A 
discussion of this can be found in section V.B.1. 
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assigned to each component. This ID will be used to identify the component in the data 
structures that internally represent the software base because it is easier to manipulate and 
it saves space. A nice feature the header file provides is the ability to represent a 
distributed software base due to the use of a networked file system. Notice components 


1100 and 1400 are components that actually exist on remote machines. 


1000 /CAPS/sb/set 
1100 /net/pegasi/comp_lib/sequence 
1200 /CAPS/sb/trig 


1300 /CAPS/sb/map 
1400 /net/taurus/CAPS/sb/stack 





Figure 3: Sample Header File for a Software Base 


Now that we have a way of representing the components in secondary storage we 
need a way of representing them internally. Figure 4 shows the objects used to represent 


components in memory using Rational’s Unified Method [12]. 
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psdl_filename: string 


createComponent 
addGenericsMapping 
componentEqual 


Component!D 






generics_mapping 


GenericsMap<psdl_id, psdl_id, eq, eq> 


i 
| 
4 
t 
! 
t 
| 
| 
t 
| 





a en 


generic_map_pkg 





ComponentiDMap<ComponentID, Component, 
“="_ componentEqual> 


Figure 4: Component Types 


The ComponentIDMap maps the ComponentID to a Component. The ComponentlID is 
the unique integer read in from the software base’s header file. The Component contains 
the filename of the component’s PSDL specification and an association to an instance of 
a GenericsMap. A GenericsMap maps the generic parameter identifiers in generic 
components to actual type names. This needs some explanation. 


Suppose we have the following PSDL specification for the generic component 


Stack: 
Puree ocack OPERATOR Pop 
SPECIFICATION SPECIFICATION 
GENERIC ENEUT 
LtGM =, ERIVATE Tf Yee The Stack : Stack 
OUTPUT 
OPERATOR Push iievstacks: s stack 
SPECIFICATION EXCEPTIONS 
INPUT Overflow, Underflow 
ihe Sl tensa ntem, END 
On Tne totach =e acack 
OUTPUT 
On Thesstack=. .otacn 
EXCEPTIONS 
Overflow, Underfliow 
END 


OPERATOR Depth_Of OPERATOR Is_Empty 


SPECIFICATION SPECIFICATION 
INPUT INPUT 
mae Stack : Stack The Stack *: Stack 
OUT PUT OUTPUT 
Result : Natural Result : Boolean 
EXCEPTIONS EXCEPTIONS 
Overflow, Underflow Overflow, Underflow 
END END 
END 


This component has one generic parameter named /tem and makes reference to three 
. different types: Stack, Natural, and Boolean. Instantiating Item to the different types 
used in the component can potentially yield a different component profile for each 
instantiation. This could place the various instantiations into different partitions. Hence, 
each generic component must undergo the generic instantiation process to obtain the 
various generic parameter mappings. Each instantiation is stored internally as a separate 
component with its unique generic mapping. The ComponentID for each instantiation is 
based on the base ID from the header file. For example, if the header file assigns the ID 
1200 for the stack component listed above then the ComponentID entries in the 


ComponentIDMap would be 1201, 1202, 1203, and 1204. Table 7 illustrates this 


mapping. 


Table 7 


ComponentID Component.generic_mapping 


Item — Stack 


Item — Natural! 
Item — Boolean 
Item — Item 





Notice there is a fourth entry for mapping Item to itself. This is a simple way of 
representing the possibility that the generic parameter does not map to any of the types 
used in the component. Another important point to note is the ids in the header file need 
to be spaced sufficiently to give the generic instantiation algorithm room for the 
automatic generation of unique ids for a given component. The software base used to test 
the ideas in this thesis was given a spacing of 100 between component ids, which 


provided sufficient room for generic instantiation. 
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One final point to note regarding generic parameter instantiation is a single 
generic component can end up being instantiated into numerous components through the 
generic parameter instantiation process. This 1s especially true for components with more 
than one generic parameter because the cross-product of the generic parameters and the 
normal types in the component must be computed to exhaust all the possible 
combinations. Measurements regarding the instantiation of generic components are 


presented in section VI.A. 


be Profile Types 


The natural design for profiles and component profiles is to use sequences. A 
Profile would be implemented as a sequence of integers and a ComponentProfile would 


be implemented as a sequence of Profiles. This approach is depicted in Figure 5. 


















profileE qual<"="> 
profileLess Than<"<"> 


componentProfileEqual<profileEqual> 
componentProfileMember<profileEqual> 
componentProfileRemove<profileEqual> 
componentProfileSort<profileLess Than> 
subbag<profileEqual> 

addProfile 
addProfiles 


Figure 5: Profile Data Types 


The method profileLessThan provides a means of ordering the Profiles lexicographically 
in the ComponentProfile. The advantages of such an ordering are detailed in section 
V.B.1. The method subbag is a multiset subset operation that can be used to order the 
partitions in the haase diagram since the partitions are keyed using ComponentProfiles. 


The design and implementation of the haase diagram is discussed in section IV.C.3. 
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Section [V.B.2 introduced time-and-space improvements to the design in Figure 


5. These improvements are represented in Figure 6. 


ProfilelD 


Long_Long_Integer 
generic_ 









a a 


profileLess Than 





ComponentProfile<ProfilelD, 4> 
es. eae 


componentProfileEqual<profileEqual> 
componentProfileMember<profileEqual> 
componentProfileRemove<profileEqual> 
componentProfileSort<profileLessThan> 
subbag<profileEqual> 

addProfile 
addProfiles 







generic_map_pkg 








ProfileLookupTable<ProfilelD, Profile, “=", "="> 


createProfileLookupT able 
addProfile 





Figure 6: Time-and-Space Improvements for Profile Types 


This shows all of the improvements used together but it is possible through the use of 
abstract data types to mix and match the designs. For example, if one wanted to be able 
to handle operations with more than nine arguments (see section IV.B.2.a) then Profile 
could be implemented as a sequence of integers rather than the Long Long Integer and 


still be able to take advantage of the ProfileLookupTable. 


B: Haase Diagram Types 


The haase diagram can be constructed using the objects in Figure 7. 


Zl 


createHaaseNode 
addComponent 


addChild 
haaseNodeEqual 
haaseNodeAssign 





key components _ children 


ComponentProfile Component!DSet ComponentProfileSet 


generic_map_pkg 


HaaseDiagram<ComponentProfile, HaaseNode, componentProfileEqual, haaseNodeEqua!l> 


createHaaseDiagram 
addHaaseNode 
addBaseNodes 
connectNodes 
generateGML 





Figure 7: Haase Diagram Data Types 


A HaaseNode is a partition that 1s keyed by a ComponentProfile. The node contains 
components that have the same ComponentProfile as the key. Notice the components are 
a set of ComponentIDs rather than Components to save space. When access to the actual 
component is necessary the ComponentID can be used to fetch the component from the 
ComponentIDMap as described in section IV.C.1. The HaaseNode is related to other 
nodes (or partitions) through its children association. This association 1s implemented as 
a set of ComponentProfiles, which are the keys to the next partitions in the ordering. 
Relating nodes in this way allows the use of a map to represent the entire haase diagram. 
Direct access to partitions can be obtained by fetching with a ComponentProfile key. 

Constructing the haase diagram 1s a three step process. 

Step 1: for each component check if a node exists with that component’s 

CompoentProfile. If it does then put that component in that node 


(add it to the node’s components association). Otherwise add a 


ae 


new node with the component’s ComponentProfile as the key 
and put the component in it. 

Step 2: for each Profile in each HaaseNode’s key add a node to represent 
a base node. This is accomplished by calling addBaseNodes on 
the populated HaaseDiagram from step 1. 

Step 3: connect the nodes (set the children association for each node) 
based on the following invariant from [9]: n2 is nl’s child if and 
only if swbbag(nl.key, n2.key) and there is no node n3 such that 
subbag(nl.key, n3.key) and subbag(n3.key, n2.key). This is 
accomplished by calling connectNodes on the populated 


HaaseDiagram from step 2. 


4. Candidate Types 


Candidates are the “currency” passed between the various stages of the multi- 


level filtering process. Figure 8 shows these data types. 


Candidate 
profile_rank 
keyword_rank 


candidateEqual 
candidateLessThan 
candidateAssign 
newCandidate 





Signature_matches component_id 


SigMatchNodeSet ComponentiD 


ordered_set_pkg 





CandidateSet<Candidate, candidateEqual, candidateLessThan> 


Figure 8: Candidate Data Types 
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A Candidate is a component with a ranking. The component 1s identified through the 
ComponentID association. The ranking 1s a combination of the results of profile filtering, 
keyword filtering, and signature matching. [9] calls this combination the KPS value. 
Each candidate can have multiple signature matches, each with a different signature rank, 
SO an association to a SigMatchNodeSet is present (see section V.C). Notice the 
CandidateSet is an ordered set. The ordering is provided through the candidateLessThan 


method which uses the KPS value to determine the ordering. 


>. Software Base Types 


The software base ties everything together. The software base object and the 
functional summary of its methods are shown in Figure 9. The initialize method is 
responsible for parsing the header file, loading the components’ PSDL specifications, 
generating the generics mappings for the generic components, computing the profiles, 
and populating the haase diagram and component id map. The software base also 
provides some methods for gathering statistics, including a method to generate a GML 
[4] file to graphically depict the haase diagram. Finally, the software base contains 


methods for profile filtering and signature matching. 
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software_base | 
——— — 


lamiance header nenane stnng) 

| profileFilter(query_filename) retum CandidateSet 
signatureMatch(query_filename, Candidate) retum Candidate 
numPartitions retum natural 

/ numOccupiedPartition retum natural 

numComponents retum natural 

| generateGML(gmi_filename: sinng) 





the_component_id_map the_haase_diagram 
ComponentiDMap HaaseDiagram 
ea a 
header filename 
<— GML file -— generateGML jem filename —— & 
\ -. 
<— natural numPartitions i a a ase |—natural - > 


_——*"\ Components , 


ee eee eee SS 
the_component_id_map 
















numOccupied 


<< natural Partitions 








eo i 
mc Nenane =a CandidateSet ———» 
a 
query filename = 
\ a ecie Candidate ————» 





Candidate ——-——_» 


Figure 9: Software Base Types and Functions 


6. Profile Filtering Strategy 


We now have an infrastructure with which to experiment and conduct profile 
filtering and have laid the groundwork for a signature matching implementation which is 
presented in section V.C. The profile filtering strategy laid out in [9] can now be applied 
to this design and is encapsulated in the method profileFilter. A high level expansion of 


the profileFilter method in the software base is shown in Figure 10. 
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query filename profileFilter CandidateSet-_—_» 





profile_filter 


Signature matchin 
indCandidates ao 


and ranking 


ComponentProfile CandidateSet 








the_haase_diagram the_component_id_map 


header filename 


Figure 10: Decomposition of profileFilter 


profileFilter decomposes into two main functions: gefComponentProfile and 
findCandidates. _getComponentProfile reads a PSDL specified query, computes its 
ComponentProfile and passes it to findCandidates where the actual profile filtering takes 
place. The decomposition of getComponentProfile is shown in Figure 11. 
getComponentProfile has been designed to take a GenericsMap if the component it is 
processing is generic. To process queries, which are assumed to NOT be generic 


however’, the GenericsMap passed in to getComponentProfile can just be empty. 


”[9] cites the handling of generic queries as a topic of future research. 
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createNumencSignatures s 
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Figure 11: Decomposition of getComponentProfile 


The algorithm to compute profiles was given as ‘a class project in [1]. The 
approach taken by the author’s group was to have the algorithm use a language 
independent (including independence from PSDL) signature for greater reuse potential 
with other specification languages. The resulting signature, which is referred to as a 
numeric signature, is represented as an array of integers where each unique integer 
represents a different sort group and each entry in the array indicates to which sort group 
each argument belongs. Negative integers were used for generic sort groups and the array 
was terminated with a 0. For example, given id from the component listed in Table 6 was 


a generic parameter, the numeric signatures for each operation would be generated as 


listed in Table 8. 
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Table 8: Numeric Signatures for Bag 


Component Operations Numeric Signature 


add: id bag —> bag [-1,1,1,0] 
merge: bag bag — bag Peo] 

member: id bag — bool [-1,1,2,0] 
freq: id bag — natural [-1,1,2,0] 



















This format works fine for the profile definition in [9] and even for the value sort 
frequency improvement presented in section IV.B.1. The problem with this format arises 
when adding the other profile improvements that are concerned with measuring the 
frequency of predefined and user-defined sorts. The integers in the numeric signature do 
not carry with them sort identity. Hence the numeric signature was modified to contain 
these two profile improvement properties directly. The first integer after the original 
terminating 0 represents the type sort frequency and the remaining integers represent the 
frequency of the predefined sort groups. Also, the createNumericSignatures method was 
modified to take a GenericsMap to create a numeric signature with the generic parameters 
instantiated and therefore remove any negative integers representing generic parameters. 


The improvements are represented in Table 9 and assume id is mapped to a boolean. 


Table 9: Improved Numeric Signatures 


Component Operations Numeric Signature 


add: id bag > bag [1,2,2,0,2,0,0,0,0,0] 
merge: bag bag + bag [1,1,1,0,3,0,0,0,0,0] 
member: id bag — bool [1,2,1,0,1,2,0,0,0,0] 
freq: id bag —> natural [1,2,3,0,1,0,0,0,1,0] 




















With these improvements to the numeric signatures we can now develop an algorithm to 
compute profiles for generic components with all the improvements presented in this 
thesis from a language independent format. This algorithm is represented by the function 


compute Profile and its source code can be found in the appendix. 
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V. SIGNATURE MATCHING 


A. CURRENT STATE OF THE ART 


[9] proposes a strategy for signature matching that involves the discovery of 
Partial Signature Maps. A partial signature map maps operations and sorts from the 
query to operations and sorts in the candidate. The signature maps are called partial 
because it is possible that not all of the query’s operations can be mapped to operations in 
the candidate component. A signature map that successfully maps all of the query’s 
operations is considered a Full Signature Map. 

In [9] syntactic profiles play an important role in signature matching. Their use in 
profile filtering eliminate syntactically incompatible components from being passed on to 
signature matching, but most importantly they provide a quick test for determining which 
operations in the query and the candidate have the potential for matching. Simply stated, 


Signature matching is only performed on operations that have equal operation profiles. 


B. IMPROVEMENTS 


Signature matching becomes expensive as the sizes of the query and the candidate 
grow. More specifically, the number of possible operation pairings grows exponentially 
as the number of syntactically compatible operations (operations with equal syntactic 
profiles) increases. To compound the problem, the number of possible sort matches for 
each pairing grows exponentially as the number sort occurrences increases. These 
combinatorial explosions can result in large search spaces. Table 10 illustrates the 


problem: 
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Table 10 


O1:EAD>B [4,1,4,0,0,0,0,0,0] Cry Waves Z 
Q2:ABCD->F [5,1,5,0,0,0,0,0] C2: JKLM>Y 
Q3: DBCA>E [5,1,5,0,0,0,0,0] C3: WXYZ OT 
[5,1,5,0,0,0,0,0] C4:WX YZ OS 
























The query operation Q1 can only match to Cl because Cl is the only operation in the 
component that contains a compatible operation profile. Q2, however, can match to C2, 
C3 and C4. Furthermore C3 can also match to C2, C3, and C4. Before sort matching 
occurs we already have many possible combinations of operation parings to test. The 
problem really explodes as the sorts for each of these possible pairings undergo the 
matching process. For the Q2/C2 pairing, A can match to J, K, L or M. For each of these 
possibilities B must then be matched to the remaining types in C2. This continues until 
all the possibilities are permuted for the Q2/C2 pairing. And this is just for the Q2/C2 
pairing! 

Below are several improvements that can be made to combat the combinatorial 


explosion problems associated with matching large components. 


i Operation Ordering 


[9] suggests ordering the operations in the query and components by their 
syntactic profiles as a possible improvement to signature matching. This would allow the 
signature matching algorithm to sequentially step through the operations for matching 
and reduce the number of combinations to be considered. 

The signature matching algorithm presented in this thesis uses the concept of 
operation ordering to help constrain the search by matching smaller operations before 
larger operations. By ordering profiles lexicographically, the smallest operations would 
be the operations with profiles that come first in the ordering. For example, in Table 10 
Q1 is smaller than Q2 because it contains less sort occurrences. This is indicated by the 
first property in the profile and therefore Q1’s profile is ordered before Q2. Given this 


ordering, we can intelligently match the sorts for smaller operations before matching the 
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sorts for larger operations. This is advantageous because smaller operations constrain the 
number of matching possibilities and therefore can contribute to a quick reduction of the 
search space. This is explored in greater detail in the section on design and 
implementation for signature matching. 

Additional techniques for reducing the search space that are not dependant on 


operation orderings are considered next. 


2: Match Outputs 


When a query operation is mapped to a candidate operation we can immediately 
attempt to map the value sorts of the operations because all operations are normalized to 
the point of having a single output [9][10]. This reduces the search space in two ways. 
First, if either of the value sorts 1s already mapped (because of a previous operation 
mapping) then it 1s possible the operations cannot be mapped. This would be based 
simply on the fact that the value parameters are already mapped to different sorts. Table 


11 illustrates this concept: 


Table 11 


Operation Profiles 
QI:-EAD->B [4,1,4,0,0,0,0,0,0] Cl: VWY>Z 
Q2:ABCD->B [5,1,5,0,0,0,0,0] C2: J KLM >Y 


Q3:DBCA SE [5,1,5,0,0,0,0,0] Cay 2d 
[5,1,5,0,0,0,0,0] C4: WXYZ>S 





Suppose we map Q1 to Cl. This would mean B would have to map to Z. Now we move 
to Q2. Suppose we attempt to map Q2 to C2. This would mean B would have to map to 
Y but this is illegal because B was already mapped to Z! Thus we can immediately prune 
this branch of the search space and try to map Q2 to C3, which coincidentally will not 
work either. Hence we have a way of quickly eliminating possible operation mappings 
before moving on to the potentially more expensive task of mapping the input sorts. 

The second way this technique reduces the search space is by constraining the 


number of input sorts that have to be matched in an operation. As we saw from the 
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example Table 10 illustrated, the search space grows exponentially as the number of 
unmapped sorts for an operation grows. Using Table 10 again, if we map Q1 to Cl and 
Q2 to C2 then by applying the technique of immediately matching the output sort, we 
would have B mapped to Z and F mapped to Y. The fact that B is mapped means that we 
can eliminate B from the set of unmapped sorts in Q2 when performing sort matching for 
the Q2/C2 pair. This means only three of the four input sorts would have to be permuted 
to discover a sort mapping. It turns out that in this particular case, however, since B is 
mapped to Z, Z, or some supertype of Z, would have to be present in C2’s set of input 
sorts but it is not, therefore we can eliminate the Q2/C2 pairing immediately and prune 


this branch from the search space. 


a Match Predefined Types 


[9] and [2] both allude to the fact that basic types must be preserved in the partial 
signature map. Such a rule well serves the quest for reducing the signature matching 
search space by establishing more constraints that can be applied early in the process. 
For example, the previous section described how the output parameters could be matched 
immediately following an operation mapping to determine if such a pairing was worth 
exploring further. Incompatibilities were not caught, however, until at least two 
operations had been proposed for matching. By applying the constraints that predefined 
types impose, we have an opportunity to short circuit the branch even earlier. Consider 
Table 11 for example. If B is an integer, then Z must belong to the integer sort group 
such that Z is a subtype of B.* If Z does not meet this criteria than the branch can be 
pruned immediately and Q1 and C1 will never be considered for matching. If Z did pass 
such constraints then Q2 and C2 can be considered for matching, thereby subjecting Y to 
the same constraints that Z was required to pass. 

The preservation rules of predefined types can also be used to reduce the number 


of unmapped input sorts to permute. All of the query’s predefined sorts can be tested for 


* [2] explicitly declares subtype matching rules for input and output parameters. Such rules and their 
applicability to the method of signature matching described in this thesis are addressed in section V.C.3. 
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compatibility with the candidate before the permutation process begins. If they all have 
matches then they can all be removed from the query’s set of input parameters, leaving 
just the unmapped user-defined types for permutation. Clearly this can have profound 
effects on the number of permutations required to evaluate and therefore pare the search 


space down significantly. 


C. DESIGN AND IMPLEMENTATION 


This section is divided into two subsections. The first subsection introduces the 
objects used to implement the signature matching algorithms and the second subsection 
discusses the signature matching approach in terms of the objects defined in the first 


subsection. 


1. Signature Matching Types 


In order to better illustrate the signature matching strategy proposed in this thesis 
we must first examine the data types used to carry out the strategy. Figure 12 depicts the 


signature matching objects. 
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Figure 12: Signature Matching Types 


The first object of interest is the SignatureMap. This is used to store the operation 
and type mappings between a query and a candidate and follows [9]’s definition of a 
partial signature map. 

At the crux of the design is the SigMatchNode. This data type is used to represent 
solutions in the signature matching search space by being represented as a node in a tree 
data structure. The node stores the signature and semantic ranks of the solution (the 
SignatureMap V) it represents and maintains validation and expansion information for 
search space maintenance. Since this object is used to form a tree, a handle to a single 
SigMatchNode can be used to contain the entire search space. When the signature 
matching process is finished, all the leaves of this tree can be considered valid solutions 
and therefore can be “clipped” from the tree and returned as the set of solutions. The 
getLeafNodes method is the leaf “clipper” in this case. 

Finally, the SigMatchNodeSet is used to store the set of SigMatchNodes that will 


be placed in the Candidate (section IV.C.4) object. Notice first that this collection is a set 
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so that duplicate solution nodes can be easily eliminated and second that this set is 
ordered. The ordering is defined by the signature rank until semantic matching is 


performed. Once semantic matching has taken place, the semantic rank takes precedence. 


Z Signature Matching Strategy 


A high-level view of the signature matching strategy is depicted in Figure 13. 
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Figure 13: High-level View of Signature Matching 


This illustration shows the context in which the core signature matching function, 
matchOps, operates. Once profile filtering is complete, each candidate with a profile rank 
above a certain threshold is passed to signatureMatch along with the original query. 
signatureMatch then outputs the same candidate passed in but with its set of 
SigMatchNodes populated. 

signatureMatch decomposes into four major steps. The first step calculates the 
profiles for the operations in the query and the candidate and returns them in the form of 
OpWithProfile sequences. An OpWithProfile, depicted in Figure 14, is simply an 
association between an operator and its profile. An OpWithProfileSeq is a sequence of 
OpWithProfiles ordered by the lexicographic ordering on profiles used in 
opWithProfileLessThan. Yhe advantages of such an ordering were detailed in section 
V.B.1. The query’s and candidate’s OpWithProfileSeq is then passed to matchOps where 


the actual signature matching takes place. matchOps passes the root SigMatchNode of 
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the entire signature matching search space for the particular query and candidate to 
getLeafNodes where the valid signature match solutions represented in the leaves of the 
search space are extracted into a set of SigMatchNodes. Finally, the signature rank for 


each SigMatchNode in the set is computed and the set is then assigned to the Candidate. 


OpWithProfile 


opWithProfileEqual 
opWithProfileLessThan 


CO 
op 





op_ profile 


Figure 14: OpWithProfile Data Types 












addOpWithProfile 
owpSeqgEqual<opWithProfileEqual> 

owpSeqMember<opWithProfileEqual> 
owpSeqRemove<opWithProfileEqual> 
owpSeqSort<opWithProfileLess Than> 


The core signature matching routine is matchOps. Ada-like pseudo-code for 


matchOps is listed below: 


procedure match ops(query: in OpWithProfileSeq, candidate: in OpWithProfileSeq, 
root _sn: in out SigMatchNode) is 
temp sn, return_val: SigMatchNode; 
temp query, temp_candidate: OpWithProfileSeq; 
q op, c_op: OpWithProfile; 
begin 
Leturneval) :— Loot sn, 


-- depth-first-search into possible operation pairings 
temp query := query, 
temp candidate := candidate; 
foreach OpWithProfile q op in query loop 
foreach OpWithProfile c_op in candidate 
Wiebe Gq Op-op prolLile — ¢ Op.op protile loop 
Cemp sn): — Loe sn; 
op_map_pkg.bind(q op.op, c_op.op, temp_sn.V.OM); 
if not validPairingExists(temp_sn.V.OM, return_val) then 
if match_outputs(temp_sn) then 
if matche basics (gel, basilica qucp. inputs), 
Get basics (ce Opeanpurs))) senen 
EeMp Query j= temp lquerys.q Op, 
temp_candidate := temp_candidate - c_op; 
match ops (temp query, temp candidate, temp sn)? 
addBranch(temp_sn, return val); 
end if; 
end if; 
ena 15; 
end loop; 
end loop; 
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-- depth-first-search into possible input pairings 
-- prune until all leaves are valid solutions 
pruned := true; 
while pruned loop 
pruned := false; 
FOGE Sm c= return val; 
bOredch leatnode leaf sn in root sn loop 
if leaf _sn.validation = UNKNOWN then 
1f not match_inputs(leaf_sn) then 
leaf sn.validation := INVALID; 
elsif not verify subtypes(leaf sn) then 
leaf _sn.validation := INVALID; 
else 
if length(leaf_sn.branches) = 0 then 
leaf _sn.validation := VALID; 
elec 
leaf _sn.expanded for inputs := true; 
end Lf; 
endaus, 
if leaf_sn.validation = INVALID then 
removeAllMatchingBranches(leaf sn, return val); 
Pruned := true; * — 
endwi es; 
end if; 
end loop; 
end loop; 


HOCte si .— Lecurn Valy 
end match_ops; 


There are two main sections to this procedure. The first is a depth-first search into the 
space of all compatible operation pairs and the second is a combined matching of input 
sorts and retraction of invalid nodes. The first section steps through each operation in the 
query, trying to match it to an operation in the candidate. This is done by invoking the 
following three steps in order: first verifying that the profiles are equal, second verifying 
if the outputs match (see section V.B.2) and third verifying that the predefined types can 
match (see section V.B.3). If any of these three steps fail, the operation pairing is not 
considered and the remaining tests are immediately short-circuited to reduce time. If the 
three steps succeed then the pairing is added as a branch to the root SigMatchNode 
passed in to matchOps and matchOps is recursively called again with the same query and 
candidate OpWithProfileSeqs with the operations just paired removed. Figure 15 
illustrates the search space for possible operation pairings for the query and component in 
Table 10. The highlighted path is the path searched before moving on to the second part 
of matchOps that involves matching the input sorts and performing any possible 
retractions of invalid nodes. The rest of the space is depicted here to illustrate the nature 


of the search space but at this point only the highlighted nodes have been instantiated. 
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Figure 15: Example Search Space for Compatible Operations 


Now that we have searched to the bottom of this particular path we can now begin 
expanding the search space for matching the inputs for the various operations paired in 


the node. Ada-like psuedo-code 1s listed below for this phase of signature matching. 


aac onctiOnemacecheinputs 


function match_inputs(root_sn: in out SolutionNode) return boolean is 


EUNCCION match (q inputs: in Typesequence; Greinpuvs-ein Typesequence; 
root Sn: in out SolutionNode) return boolean is 
begin 
-- recursive stopping case 
if size(q inputs), — 0, then 
return; 
end if; 


TeECurEne Vd lee el OOU. Si; 


new_q inputs := q_ inputs; 
new_c_inputs := c_inputs; 


-- verify mapped inputs in q inputs are legally mapped 
-- and set new_q inputs and new_c_inputs to only 
-- the unmapped inputs 
foreach input _ type qi in q_ inputs loop 
Peetype Map pkg-member (qi, ToOoussn. V. TM) then 
ci := type map _pkg.fetch(root_sn.V.TM, qi); 
-- if the current input type is already mapped 
-- then make sure it is mapped to an existing type 
=-— in the candi@ate’s inpue. 
if not type sequence pkg.member(ci, c_inputs) then 
return false; 
end if; 


new q inputs := new_q inputs - qi; 
new_c_ inputs := new_c_inputs - ci; 
end if; 
end loop; 
qi := q inputsf[l]; 
Foreach input type Ci in ¢€ inputs loop 
temp sn := root_sn; 
temp_sn.expanded for inputs := false; 
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type map _pkg.bind(qi, ci, temp_sn.V.TM); 
if not match(new_q_ inputs-qi, new_c_inputs-ci, temp sn) then 
return false; = 
end ii? 
addBranch(temp_sn, return_val); 
end loop; 


root sn := return_val; 
return Crue; 
end match; 


begin 
foreach op_mapping om in root_sn.V.OM loop 
-- remove the input types that have already been mapped 
quinpues om.key.inputs - type _map_pkg.map domain(root_sn.V.TM); 
G inputs <= om.result.inputs = type map pkg.map range(root_sn.V.TM); 


Woof 


-- if the number of remaining input types for the query and 
-- the candidate are unequal thah the operations cannot match 
if type sequence _pkg.length(q_ inputs) /= 
Eype sequence pkg -Tength{e inputs) then 
return false; 
end if; 


-- if the node has already been expanded before to try and match 
-- the inputs and it still has unmapped input types then return 
-- false so we won't try again 

if root _sn.expanded for_inputs then 


return size(q_ inputs) = 0; 
eras 
Pecummeamaveni(gelebasies (q inputs), get basics\(eninputs),(reec sn); 
end if; 
end loop; 


end match_inputs; 


This function first removes any sorts from the set of inputs that have already been 
mapped. At this point in our example, this would mean removing any input sorts that 
were the same as the output sorts since the output sorts have been mapped. If this results 
in an uneven number of unmapped sorts between the query and the candidate then we can 
immediately stop and return false since there cannot be a match. Next, if this node has 
already been expanded in the past and it ultimately led to an invalid node then we do not 
want to expand this node again since we know where it leads. Finally, if we make it 
through these preliminary checks then we can pass the node on to the recursive function 
match that will expand the node into all the possible input sort pairings to be investigated. 
If there are no legal possibilities, match will return false and cause match_inputs to return 
false, signaling match_ops to flag this node as invalid. 

Going back to our example, the node for which we are currently trying to match 
inputs cannot be expanded because Q1’s first input sort, E, is already mapped to T but 
there is no T in the input sorts for C1! The test for making sure the number of unmapped 


input sorts in the query is equal to the number of unmapped input sorts for the candidate 


Se 


will fail and cause match inputs to return false. Looking back at match_ops, this will 
cause the node to be pruned and match_ops will pop back to its previous instantiation on 
the stack and search the next possibility in the search space. The tree at this point is 


shown in Figure 16. 
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Figure 16: Search Space after First Pruning 


The current node will be pruned for the same reasons the first node was pruned: E is 
already mapped to a sort that does not exist in the candidate’s input sorts. So again, 
another node is pruned, match_ops pops back up and we are now left with the tree in 


PIture n 7. 
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Figure 17: Search Space after Second Pruning 





Now we have a node that will pass all the preliminary steps and successfully 


expand for all the possible input pairings. Given the relatively large number of 
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unmapped inputs in Q1 and Q2, the expansion is quite significant. Figure 18 shows part 


of this expansion. 
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Figure 18: Input Sort Matching Expansion 


The figure effectively illustrates the combinatorial explosion associated with large 
numbers of unmapped inputs and underscores the need for the improvements cited in 
section V.B. Figure 18 only shows the expansion for the first pairing Q1/C1 and one 
node for the pairing Q2/C2. Q2/C2 1s expanded in the same way Q1/C1 is expanded but 
is not shown here due to space limitations. The entire expansion is ultimately returned 
back to match ops where the valid leaves will continue to go through the match_inputs 
expansion. In this case all the leaves shown in Figure 18 will have to be expanded further 
to add any more mappings Q2/C2 brings on top of the Q1/C1 mappings. Similarly this 
would be done for Q1/C1 on top of the Q2/C2 mappings in the portion of the tree not 
shown. The expansion/prune loop in match_ops continues until there are no more leaves 
to expand and all of the existing leaves are valid. 

A legitimate concern might arise from the example thus far regarding the fact that 
Q1/C1 and Q2/C2 are expanded twice, but in different order. This must take place 
because it is possible to have different input sort mappings depending on which order the 
operations are expanded in. For instance, notice in the first node representing Q2/C2’s 
input matching (whose expansion is not shown further in Figure 18 due to space) that A 


is mapped to J. In the Q1/C1 expansions A is never mapped to J. This possibility would 
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never be explored in this part of the search space if the different ordering of pairing 


expansions were not represented. 


3. Subtype Matching 


In section V.C.2 we see that match_ops tests if the predefined types in the 
operations for an operation pairing can be legally matched. The Ada-like pseudo-code 


for determining this legality for the input types is listed below. 


function match_basics(q basics: in out TypeSequence; c_basics: in out TypeSequence) 
return boolean is 
begin 
-- cannot match if query has different number of basics than the 
-- candidate 
if size(q_ basics) /= size(c_basics) then 
return false; 
end if; 


~- Basic types: either they must match exactly 
-- or the query's input type must be a 
-- subtype of the component's input type. 


-- filter out the basics that match exactly 
néw_q basics := q basics; 
Heweerbasles) .— Gubdahes, 
foreach input_type qi in q basics loop 
foreach =ineucee pe Ci inec basics locp 
if equal(qi, ci) then 


new q basies == new q basics - qi; 
NeW lem@oastcs <= Mew eG basics — e1,; 
break; -- out of inner foreach loop 
end if; 
end loop; 

end loop; 

q basics := new_q basics; 

€ basics += new elbasics: 


-- Filter out the remaining basics that can match to supertypes. 
-- This is done by temporally mapping each query input types to a 
-- supertype in the candidate that is closest in the partial ordering. 
foreach) input type gi in gq basics loop 
foreachMinputetypewcigingembpasics loop 
if Subtype Of (qa, ci) then 
found CiZ eae talse; 
foreach input type cr2Z2 in new © basies loop 
PPOs found je17 sand Subeype orig, C17) and noteequal (er, e127) 
and subtype,of(ci2, ci) then 
found ciZ := true; 
end if; 
end loop; 
l£ not foundveci2 then 
new_q_ basics new_q basics - qi; 
new_c basics new i¢ baSicse- er; 
end if; 
end if; 
end loop; 


tt tt 
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end loop; 


_—__— 


-- if there are any basics left over than match is not possible 
-~- since basics cannot be matched to non-basics 


Pe Sara e ee cue nner ee anlenzeysuera basics) = 0; 
Of particular note is the portion of the function that is responsible for subtype matching. 
Without subtype matching the operation pairing would fail if there were predefined types 
left over after all the predefined types that can find exact matches were mapped. With 
subtype matching, however, the possibilities of mapping the remaining predefined types 
are explored. 

[2] defines subtype matching rules for mapping the input and output types of an 


Operation. These rules are summarized below: 


1. an input type of a query must be a subtype of the input type in the candidate to 
which it is mapped 
2. an output type of a query must be supertype of the output type in the candidate 


to which it is mapped 


These rules are followed in the pseudo-code above. An interesting case arises when there 
is more than one supertype available in the candidate. In such a case the algorithm above 
will choose the supertype closest in the partial ordering for that particular sort group. For 
instance, if we are trying to map a positive in the query and the candidate has a natural 
and an integer still unmapped, then the natural is selected over the integer because the 
positive is closer to the natural in the partial ordering of the integer sort group. This has 
the advantage that the less refined sorts remain available in the candidate for potential 
mappings with less refined sorts in the query. 

[9] extends [2]’s rules by maintaining subtype consistency throughout the partial 
Signature map. For example, suppose we are trying to match QI and Cl from Table 10 
and we map E to V and A to W. The subtype rules in [9] state that if E is a subtype of A 
then V must be a subtype of W, otherwise the mapping is invalid. The test of such 


consistency in the approach outlined in this thesis is made by the call to verify_subtypes 
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in match ops. If the test fails for all the mapped sorts in the node then the node is 


considered invalid and pruned. 
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VI. 


EXPERIMENTATION 


To test the effectiveness of the syntactic matching improvements presented in this 


thesis the CAPS software base [10] was selected as well as the following four queries: 


Query: Stack 


TYEE) Stack 
, SPECIFICATION 
OPERATOR Copy 
SPECIFICATION 
ENPOT 
Brom rne Stack 
PO The Stack 
OUTPUT 
Poatne otack,>; Stack 
EXCEPTIONS 
Overflow, Underflow 
END 


Stack, 
Stack 


OPERATOR Clear 
SPECIFICATION 
INPUT 
The_ Stack 
OUT PUT 
The Stack 
EXCEPTIONS 
Overflow, Underflow 
END 


Stack 


Stack 


OPERATOR Push 
SPECIFICATION 
INPUT 
die Integer 
ene the Stack 
OUT PUT 
Gne the Stack 
EXCEPTIONS 
Overflow, Underflow 
END 


Integer, 
Stack 


Stack 


OPERATOR Pop 
SPECIFICATION 
INPUT 
The stack 
OUTPUT 
The Stack 
EXCEPTIONS 
Overflow, Underflow 
END 


Stack 


Stack 


OPERATOR Is_Equal 


Query: Set 


TYPE Set 
SPECIFICATION 
OPERATOR Copy 
SPECIFICATION 
INPUT 
prom The Set: Set, 
forrnegset” ; Set 
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SPEELEICATION 
INPUT 
Left 
Right 
OUTPUT 
Result ;: Boolean 
EXCEPTIONS 
Overflow, Underflow 
END 


Stack, 
Stack 


OPERATOR Depth Of 
SPECIFICATION 
INPUT 
Thevocack 
OUTPUT 
Result 
EXCEPTIONS 
Overflow, Underflow 
END 


Stack 


Natural 


OPERATOR Is_ Empty 
SPECIFICATION 
INPUT 
The sstack 
OUTPUT 
Result 
EXCEPTIONS 
Overflow, Underflow 
END 


Stack 


Boolean 


OPERATOR Top Of 
SPECIFICATION 
INPUT 
The stack + Stack 
OUTPUT 
Result 
EXCEPTIONS 
Overflow, Underflow 
END 


Integer 


END 

IMPLEMENTATION ADA 

Stack Sequential Bounded Managed_Iterator 
END 


OUTPUT 
TO The Setse sec 
EXCEPTIONS 
Cvertlow, 1cem_ [swinesesr, 
Ltemoisanlict in’ Set 
END 


OPERATOR Clear 
SPECIFICATION 
INPUT 
tie Sete. oct 
OUTPUT 
Titcmoc lsc 
EXCEPTIONS 
Overflow, Item Is_In Set, 
TtemslssNew Inset 
END 


OPERATOR Add 
SPECIFICATION 
INE UT 
The item = slcGem, 
To The Set : Set 
OUTPUT 
TOmrne: Selmer sce 
EXCEPTIONS 
Overflow, Item_Is_In Set, 
PCeEmMelsyNeotelnaser 
END 


OPERATOR Remove 
SPECIFICATION 
INPUT 
Thesltem>:. ltem, 
From The Set : Set 
OUTPUT 
Prom Tnhensete.gser 
EXCEPTIONS 
Overflow, Item Is oin set, 
Peemets Noteineser 
END 


OPERATOR Union 
SPECIFICATION 
INPUT 
CLI THe@Sec 2 ser, 
And The Set : Set, 
[Oe TRE Sete. ecee 
OUTPUT 
To The Set: — Set 
EXCEPTIONS 
Overflow, Item Ts in Set, 
TEemiertseNoeeinesec 
END 


OPERATOR Intersection 
SPECIFICATION 
INPUT 
OESTRe SCte ace, 
And The Set : Set, 
Tepiiessete-.5ce 
OUTPUT 
Tos the ese ease: 
EXCEPTIONS 
Overflow, I[Céngis wince, 
item Ts Not in set 
END 


OPERATOR Difference 
SPECIFICATION 
INPUT 
CEPINGESECES = Sect, 
And The*Set : Set, 
foe Tne T sels set 
OUTPUT 
To The Set :; Set 
EXCEPTIONS 
Overflow, Item Is In Set, 
Item_Is Not_In Set 
END  . 


OPERATOR Is_ Equal 
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SPECIFICATION 
INPUT 
Lert: Set, 
Right < see 
OUTPUT 
Result : Boolean 
EXCEPTIONS 
Overflow, Item_Is_In Set, 
Peni SeNOunhaeset 
END 


OPERATOR Extent Of 
SPECIFICATION 
INPUT 
The Set : Set 
OUTPUT 
Result :; Natural 
EXCEPTIONS 
Overt low, Eten Is since, 
ttem fs, Notuiniser 
END 


OPERATOR Is_ Empty 
SPECIFICATION 
INPUT 
ING p Seto. oct 
OUTPUT 
Result : Boolean 
EXCEPTIONS 
Overftow, Item 1s inaser, 
Poeners NOC lingse: 
END 


OPERATOR Is_A Member 
SPECIFICATION 
INPUT 
The Item ;: Item, 
Cie inegsece. sect 
OUTPUT 
Resules: ) Boolean 
EXCEPTIONS 
Overflow, Item Is In Set, 
item Is Not _Injsec 
END 


OPERATOR Is_A Subset 
SERCUr LCATION 
INPUT 
Bett: Set, 
Right : Set 
OUTPUT 
Result : Boolean 
EXCEPTIONS 
Overflow, Itemuls_ In set, 
fGen sis sNoteineset 
END 


OPERATOR Is_A_ Proper_Subset 
SPECIFICATION 
INPUT 
Dette. s5cl, 
RIgnt ys. scr 
OUTPUT 
Result : Boolean 
EXCEPTIONS 
Overtlow, Item Is in set, 
Leenvis NOt ein sce 
END 


END 

IMPLEMENTATION ADA 

Set Simple Sequential Bounded Managed Ite 
rator 

END 


Query: Map 


TYPE Map 
SPECIFICATION 
OPERATOR Copy 
SPECIFICATION 
PNHEOT 
From The Map 
To The Map 
OUTPUT 
To The Map 
EXCEPTIONS 
Overflow, Domain_Is_Not_Bound, 
Multiple Binding 
END 


Map; 
Map 


Map 


OPERATOR Clear 
SPECIFICATION 
INPUT 
The Map 
OUTPUT 
The Map : Map 
EXCEPTIONS 
Overflow, Domain_Is_Not_Bound, 
Mibtrple Binding 
END 


Map 


OPERATOR Bind 
SPECIFICATION 
INPUT 
The Domain Natural, 
And_The Range : Ranges, 


ime the Map Map 
OUTPUT 
inethe Map Map 
EXCEPTIONS 
Overflow, Domain_Is Not_Bound, 
Multiple Binding 
END 


OPERATOR Unbind 
SseucrRICATION 
INPUT 
The Domain : Natural, 
in ine Map . Map 
OUTPUT 
tnethe Map : 
EXCEPTIONS 
Overflow, Domain_Is_Not_Bound, 
Multiple Binding 
END 


Map 


OPERATOR Is Equal 


SPECIFICATION 
INPUT 
Left : Map, 
Rignt + Map 
OUTPUT 
Result : Boolean 


uery. YVucue 


TYPE Queue 
SPECIFICATION 
OPERATOR Copy 
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EXCEPTIONS 
Overflow, Domain_Is Not Bound, 
Multiple Binding 
END 


OPERATOR Extent Of 
SPECIFICATION 
INPUT 
The Map 
OUTPUT 
Result 
EXCEPTIONS 
Overflow, Domain_Is Not Bound, 
Multiple Binding 
END 


Map 


Natural 


OPERATOR Is_Empty 
SPECIFICATION 
TNEUL 
The Map 
OUTPUT 
Result 
EXCEPTIONS 
Overflow, 
Mace np les Binding 
END 


Map 
Boolean 


Domain_Is_ Not_Bound, 


OPERATOR Is_Bound 
SPECIFICATION 
INPUT 
The Domain 
In The Map 
OuUrrul 
Result 
EXCEPTIONS 
Overflow, 
Huoltiple Binding 
END 


Natural, 
Map 


Boolean 


Domain _Is_ Not _Bound, 


OPERATOR Range Of 
SPECIFICATION 
INPUT 
The Domain 
In The Map 
OUTPUT 
Result 
EXCEPTIONS 
Overflow, 
Multiple Binding 
END 


Natural, 
Map 


Ranges 


Domain _Is Not _Bound, 


END 

IMPLEMENTATION ADA 

Map_ Simple Noncached_Sequential_Unbounded 
_Managed_Iterator 

END 


SPECIFICATION 
INPUT 


From The Queue Queue, 


To The Queue : Queue 
OUT PUT 
Lo the Queue : Oucue 
EXCEPTIONS 
Overflow, Underflow 
END 


OPERATOR Clear 
SPECIFICATION 
INPUT 
The Queue : Queue 
OUTPUT 
inenoueue, : Gueue 
EXCEPTIONS 
Overflow, Underflow 
END 


OPERATOR Add 
SPECIFICATION 
INPUT 
TWemibeeriy. = tem, 
To The Queue :~ Queue 
OUTPUT 
20. Thnewouecue -; soueue 
EXCEPTIONS 
Overflow, Underflow 
END 


OPERATOR Pop 
SPECIFICATION 
INPUT 
The Queue : Queue 
OUTPUT 
Thew@ucvesss Yeue 
EXCEPTIONS 
Overflow, Underflow 


Right : Queue 
OUTPUT 
Result : Boolean 
EXCEPTIONS 
Overflow, Underflow 
END 


OPERATOR Length_Of 
SPECIFICATION 
INPUT 
The Queue : Queue 
OUTPUT 
Result : Natural 
EXCEPTIONS 
Overflow, Underflow 
END 


OPERATOR Is Empty 
SPECIFICATION 
INPUT 
They Cueue [eouene 
OUTPUT 
Result : Boolean 
EXCEPTIONS 
Overflow, Underflow 
END 


OPERATOR Front_Of 
SPECIFICATION 
INPUT 
Tre Queue). Queue 
OUTPUT 
Result : Item 
EXCEPTIONS 
Overflow, Underflow 
END 


END 
END 
OPERATOR Is) Equal IMPLEMENTATION ADA 
SPECIFICATION Queue Nonpriority Nonbalking Sequential B 
INPUT ounded Managed Iterator 
Left : Queue, END 


The queries are instantiations of generic components from the software base to ensure 
interesting matching activity. An attempt was made to instantiate the generic parameters 
differently in order to facilitate observation of the different manifestations of 
improvement in this thesis. For instance, the stack and queue query use predefined types 
for the generic parameters, whereas the set and map queries do not. As a result, the 
sensitivity to improvements involving predefined types will be different amongst the 


queries. 


A. GENERIC COMPONENT INSTANTIATION 


The CAPS software base contained 80 components, most of which are generic 


abstract data types. After instantiating all of the generic components with all possible 
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combinations (see section IV.C) the number of searchable components increased to 566. 
This is a substantial increase and demonstrates the need for quick filters early in the 


search process. 


B. PROFILE FILTERING 


1. Software Base Resolution 


As mentioned in section IV.B.1, increasing the resolution of profiles will increase 
the resolution of the software base by requiring more non-empty partitions (haase-nodes) 
to store the components. An increased partition count means there will be fewer 
components sharing partitions and therefore contributes to an increase in precision 
without a loss in recall during profile filtering. A simple metric for determining the 
effectiveness of the resolution improvements is the number of partitions necessary to 
store all the components in the software base. The more partitions, the more effective the 
profile resolution improvement. Figure 19 illustrates the effectiveness of the resolution 
improvements outlined in this thesis on the CAPS software base. The graph shows that 
applying all the resolution improvements yields a 65% gain in the number of partitions 


Over no improvements at all. 


49 


oO - —— = —«— ——_——— ey 


Effectiveness of Profile Resolution 
Improvements 





Number of Partitions 
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Figure 19: Effectiveness of Profile Resolution Improvements on Software 
Base Partitioning 


By observing the effects of the different resolution improvements individually it is 
evident that the kind of components in the software base drive the effectiveness of the 
various improvements differently. For instance, in the CAPS software base the 
components have similar operations to one another that do not vary in the frequency of 
the value sort. Thus the value sort frequency improvement has no effect. The 
components do, however, make reference to various predefined types thereby causing the 
substantial increase in partitions. If the CAPS software base did not use predefined types 


at all, however, then such an improvement would obviously have no effect. 


Zz. Profile Filtering Performance 


Increasing the resolution of the profiles should cause an increase in precision 
without a recall penalty. Hence, we want to see a reduction in the number of components 
returned at high profile rank thresholds. Such behavior is exactly what we see in the 
graphs illustrated in Figure 20 through Figure 23. For each query a substantially greater 
number of components are filtered out at high profile rank thresholds when all of the 


profile improvements are employed then when none of the improvements are employed. 
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The effectiveness of each profile improvement individually is again dependent upon the 
properties of the query and the components in software base. As we saw in Figure 19, the 
CAPS software base is rather sensitive to the predefined sort frequency improvement. As 
expected, this sensitivity 1s evident during profile filtering. For example, when the 
profile rank threshold is set at 1 (requiring a 100% match) the predefined sort frequency 


causes the number of recalled components to be drastically reduced. 
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Figure 20: Histogram Comparison of Profile Filtering Results with Stack Query 
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Histogram Comparison of Profile Filtering Results 
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Figure 21: Histogram Comparison of Profile Filtering Results with Set Query 
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Figure 22: Histogram Comparison of Profile Filtering Results with Map Query 
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Figure 23: Histogram Comparison of Profile Filtering Results with Queue Query 


Figure 24 through Figure 27 present a different perspective on the effectiveness of 
the resolution improvements during profile filtering. These graphs maintain a running 
sum of the number of recalled components throughout the continuum of profile rank 
thresholds. They show us that at a profile rank threshold of .65 (65% of the operations in 
the query must be in the component) the improvements lose their advantage. In other 
words, if the user sets the profile rank threshold above .65, the resolution improvements 


presented in this thesis will have a significantly positive effect on increasing precision. 
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Figure 24: Running-Sum Comparison of Profile Filtering with Stack Query 
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Figure 25: Running-Sum Comparison of Profile Filtering with Set Query 
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Figure 26: Running-Sum Comparison of Profile Filtering with Map Query 
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Figure 27: Running-Sum Comparison of Profile Filtering with Queue Query 
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Cc SIGNATURE MATCHING 


This thesis has presented improvements to signature matching both indirectly 
through improvements to profile resolution and directly through early pruning of the 
search space. This section illustrates the effectiveness of these improvements 


respectively. 


1. Effectiveness of Profile Improvements on Signature Matching 


Determining the effects profile resolution improvements have on signature 
matching 1s difficult because the traditional rankings that order the outcome of profile 
filtering and signature matching are not orthogonal. Furthermore, it is difficult to 
compare the effects the individual profile resolution improvements have on signature 
matching because they will cause the profile filtering process to potentially return 
different sets of components to pass on to signature matching. Finally, different queries 
can cause behavior that is difficult to correlate. To this end, a concise quantification of 
the effectiveness profile improvements have on signature matching will not be made in 
this thesis. Rather, informal comments can be made regarding the results of the signature 
matching process for the four queries and the various resolution improvements in Figure 
28 through Figure 31. These graphs compare the effects the different profile resolution 
improvements have on signature matching by showing distributions of the number of 
valid partial signature maps for each signature rank. The signature matching was 
performed on a randomly selected component that had a profile rank of 1 (100% of the 
query’s operations had compatible operation profiles in the component), meaning a 
signature rank of 1 was possible. 

To begin, Figure 28 shows that when all of the profile resolution improvements 
are active, a valid signature map where 88% of the stack query’s operations are mapped 
can be obtained. For the particular candidate chosen, this is not possible when no 


improvements are active. 
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Figure 28: Effectiveness of Profile Improvements on Signature Matching with Stack Query 


In Figure 29 two main characteristics are visible. First, the number of valid 
signature maps that are generated is substantially more than with the other queries. This 
is due to the fact that the set query has many operations that are compatible with the 
candidate from the software base. As described in section V.B, the possible permutations 
grow exponentially as the number of operations with compatible operation profiles 
grows. The second characteristic to notice is the lack of performance from the predefined 
sort frequency improvement. Throughout the distribution it returns the same number of 
maps as the version without improvements. This can be attributed to the lack of 
predefined types in the inputs of the operation signatures, causing the predefined sort 
frequency improvement to make only a small precision improvement over no resolution 
improvements for a profile rank of 1 (see Figure 25). The same candidate happened to be 


chosen for both cases and hence the same signature matching results ensued. 
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Figure 29: Effectiveness of Profile Improvements on Signature Matching with Set Query 


Figure 30 and Figure 31 both have examples of 100% success in syntactic 
matching. The profile resolution improvements do not make any difference in these 
examples, however, primarily because the combination of the query and software base 


caused the random selection of the candidate to select the same candidate. 
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Figure 30: Effectiveness of Profile Improvements on Signature Matching with Map Query 
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Figure 31: Effectiveness of Profile Improvements on Signature Matching with Queue Query 
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Puy Signature Matching Algorithm Performance 


The signature matching improvements presented in section V.B can be observed 
by counting the number of nodes that pass and fail the early tests for output matching and 
predefined type matching. Of particular interest is the number of failed nodes. Failed 
nodes represent nodes that are pruned. Clearly, the more nodes pruned the better. Such a 
measurement shows off the signature matching improvements presented in this thesis. 
The graphs in Figure 32 through Figure 35 show pass/fail node measurements for each 


query and compare between the various profile resolution improvements. 
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Figure 32: Signature Matching Algorithm Performance with Stack Query 
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Figure 33: Signature Matching Algorithm Performance with Set Query 
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Figure 34: Signature Matching Algorithm Performance with Map Query 
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Figure 35: Signature Matching Algorithm Performance with Queue Query 


Observing the number of failed nodes, however, does not consider the notion that 
the profile resolution improvements implicitly “prune” during the profile matching 
process. For example, the predefined sort frequency profile improvement, in many cases, 
can beat the predefined-matching signature matching improvement to the punch because 
it causes less operation pairs to be generated for operations with predefined types 
(operation pairs are generated when an operation in the query has an equal profile with an 
operation in the candidate). Hence, to merely look for a large number of failed nodes 
does not properly measure the full effectiveness of the complete syntactic filtering 
improvements outlined in this thesis because of the lack of orthogonality between profile 


filtering and signature matching. 
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Vil. CONCLUSION AND FUTURE RESEARCH 


A. ACCOMPLISHMENTS 


This thesis has presented improvements to profile filtering and signature matching 
that help multi-level filtering achieve its goal of reducing large amounts of candidate 
components early in the process. More specifically, the resolution improvements to 
syntactic profiles enable the profile filtering process to significantly cut down the number 
of components passed on to the more computationally intensive signature matching 
process. Furthermore, we have seen that large-integer representations of syntactic 
profiles and exclusive use of a profile lookup table can lead to an optimal time-and-space 
implementation. 

The improvements to signature matching included techniques for pruning the 
search-space of signature maps in an effort to find valid mappings quicker and with less 
computational resources. Initial experiments have backed up the theoretical instinct that 
the signature matching improvements are sensitive to the profile resolution 
improvements. 

Finally, a detailed design and implementation of a syntactic matching software 
module that includes the improvements proposed in this thesis has been developed. The 
software has been written in Ada 95 and is mature enough for future inclusion with the 


other elements of multi-level filtering and CASE tools such as CAPS. 


B. FUTURE RESEARCH 


Future research should include more experiments with different software bases to 
better measure the effectiveness of the profile resolution improvements. Additionally, 
more data could be collected to better assess the effect profile resolution improvements 
have on signature matching. 

The implementation facilitated the collection of statistics for generic component 


expansion. More software bases with generic components should be experimented with 
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to gain more insight into the bloat generic components so quickly create. Further 
research into generic queries would also be insightful. The algorithms that instantiated 
generic components for search and retrieval preparation can also be used to instantiate a 
generic query. A study in using concurrent search and retrieval processes for each 
instantiation would certainly prove interesting. 

Finally, research into effective graphical user interfaces for the user is needed. 
The multi-level filtering concept is natural for supporting incremental updates of query 
| results, much like a web-browser incrementally updates information from a web-page. 
As the efficient front-end filters finish they provide early results that can be output to the 
user quickly. The user can then either select from these results or let the search process 
continue refining them. Either way, the user is given quick feedback that is important for 


user acceptance. 
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APPENDIX — SOURCE CODE 


Makefile 


#PSDL_TYPE_ROOT = /home/jsherman/MSSE/PSDL_TYPE-May97 
PSDL_TYPE_ROOT = /home2/jsherman/PSDL_TYPE-May97 


GEN = m4 generator.m4 
#GEN = gen 


INCLUDES = -I$(PSDL_TYPE_ROOT)/GNAT -I$(PSDL_TYPE ROOT) /GENERIC TYPES/GNAT - 
I$ (PSDL_TYPE_ ROOT) /INSTANTIATIONS/GNAT 


GEN_ OBJECTS = candidate_types.adb haase_ diagram.adb profile calc.adb 
profile filter _pkg.adb psdl_profile.adb run_batch.adb sig_match.adb sig_match types.adb 
software base.adb 


.SUFFIXES: .g .adb 


aoeaab: 
$(GEN) $< > $@ 


woee- run batch test profile calc 
Gages) run batch 


run_batch: $(GEN_OBJECTS) 
gnatmake $(INCLUDES) run_batch.adb 


Pesusprorile calc: $(GEN OBJECTS) 
gnatmake $(INCLUDES) test_profile calc.adb 


eleati. 
Pilea ~“,O *.ali S({GEN OBJECTS) test profile calc run_batch 


cleangen: 
rm -f $(GEN OBJECTS) 


67 


candidate_types.ads 


meme eee cee ewe we ww me we ee we re ee re ee re ee ee ce ee ee ee ee ee ee ee ee ee ea ee i ss ee eS eS 


-- Package Spec: candidate types 


cee ee ee ee ee we ee ee we ee ee ee cr cr cr cr a cr ee ee es ee ee we ea ea a se i ee ee 


with generic _sequence_pkg; 

with ordered _set_pkg; 

with component_id_ types; use component_id types; 
with sig match_types; use sig match types; 


package candidate types is 


RANK UNKNOWN: constant := -1.0; 


-~- Candidate 


type Candidate is record 
profile rank: f10at, 
keyword rank: float; 
signature matches: SigMatchNodePtrSet; 
component_id: ComponentID; 
end record; 


function candidateEqual(cl: in Candidate; c2: in Candidate) return boolean; 
function candidateLessThan(cl: in Candidate; c2: in Candidate) return boolean; 
procedure candidateAssign(cl: in out Candidate; c2: in Candidate); 

procedure candidatePut (the candidate: in Candidate); 

procedure candidatePrint(the candidate: in Candidate); 


function newCandidate return Candidate; 
procedure generateSigMatchHistogram(filename: in string; c: in Candidate); 


-- CandidateSequence 


-- Note: should use addCandidate to add a candidate to the CandidateSequence. 
= addCandidate keeps the CandidateSequence sorted. 


package candidate sequence _ pkg is new generic sequence _ pkg ( 
=> Candidate, average_size => 4); 
Sub~-Yype —Candidatesequence 1s calididate Sequence pkg- sequence; 


function candidateSequenceEqual is 
new candidate sequence _pkg.generic equal(eq => candidateEqual); 


function candidateSequenceMember is 
new candidate sequence pkg.generic member(eq => candidateEqual); 


procedure candidateSequenceRemove is 
new candidate sequence_pkg.generic remove(eq => candidateEqual); 


function candidateSequenceSort is 
new candidate sequence_pkg.generic_sort("<" => candidateLessThan) ; 


procedure candidateSequencePut is 
new candidate sequence pkg.generic put(put => candidatePut); 


procedure addCandidate(c: in Candidate; cs: in out CandidateSequence) ; 


-- CandidateSet 


package candidate set_pkg is new ordered _set pkg(t => Candidate, 
eq => candidateEqual, "<" => candidateLessThan) ; 
subtype CandidateSet is candidate_set_ pkg.set; 


procedure candidateSetPut is 
new candidate _set_pkg.generic put(put => candidatePut) ; 


function profileSkim(profile threshold: in float; 
the_candidates: in CandidateSet) return CandidateSet; 


procedure generateProfileHistogram(filename: in string; 
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the candidates: 


end candidate_types; 


in CandidateSet); 
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candidate _types.g 


ee ee eee 


7 ee _ 


with gnat.io; 

with ada.text_io; 

With adashloat text y10, 
with ada.integer text_io; 


with component_id_ types; use component_id_types; 


package body candidate types is 


-- Function: candidateEqual 
function candidateEqual (cl: in Candidate; c2: in Candidate) return boolean is 
begin 
return cl.component_id = c2.component_id; 
end candidateEqual; 


-- Function: candidateLessThan 
-- Description: sort candidates in rank descending order (highest 
== Fank first). 


function candidateLessThan(cl: in Candidate; c2: in Candidate) return boolean is 


begin 
OG 
tf elsprotile rank > c2.profile rank then 
BeGUEn crue, 
-- the followin test for less-than is just being paranoid 
-- about potential float equality problems 
elsif cl.profile rank < c2.profile_rank then 
return false; 
else 
return cl.component_id < c2.component_id; 
end if; 
end candidateLessThan,; 


-- Procedure: candidateAssign 


-- Description: makes a safe copy of a Candidate. This is primarily 
-- necessary because of the SigMatchNodeSet 
procedure candidateAssign(cl: in out Candidate; c2: in Candidate) is 
begin 

eliprotile rank 2= ec2.protile rank; 

cl. keyword rank := c2.keyword_rank; 

Gl componentside: fe. component 1d; 

Sig_ match node ptr set _pkg.assign(cl.signature matches, 

c2.signature matches) ; 

end candidateAssign; 


~~ Procedure: candidatePut 
procedure candidatePut (the candidate: in Candidate) is 
begin ] 

Gnrat:to,pubd ("7 ); 

gnat.10. put (the candidate-component 1d); 


Gnat.1Osput (ya), «ou 
adaviloat texteilo. out the candidate -prorite rank, 1, 2, 0)? 
gnat.1o0. Out (len 


sigMatchNodePtrSetPut (the _candidate.signature_ matches) ; 
qnat. iospue(") "7; 
end candidatePut; 


-~- Procedure: candidatePrint 
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procedure candidatePrint (the candidate: in Candidate) is 
begin 
gnat.io.put (“Component ID: "); 
gnat.io.put (the candidate.component_id); 
gnat.io.new_line; 
Gnat.10.put{( "Profile Rank: "); 
add, elodt text Jo.pue (the Ccandidate.profile rank, 1, 2, 0)? 
gnat.io.new_ line; 
Oniat.tOopul (sig match node ptr set pkg.size( 
the candidate.signature_ matches) ); 
gnat.io.put(” Signature Match Solutions:"); 
gnat.io.new_line; 
SigMatchNodePtrSetPrint (the candidate.signature_ matches) ; 
end candidatePrint; 


~~ Function: newCandidate 


function newCandidate return Candidate is 
return_val: Candidate; 


begin 
return val.profile rank := RANK_UNKNOWN; 
return val.keyword_ rank := RANK UNKNOWN; 
return_val.signature matches := sig match_node ptr set pkg.empty; 


return return_val; 
end newCandidate; 


-- generateSigMatchHistogram 

-- Description: generates histogram data of the signature ranks for the 

-- set of signature matches and saves it to a file so it can be 
ee read by a charting program. The format is one line 

ae for each pair where the first item of the pair is the 

= profile rank and the second item is the number of 

a candidates with that rank. 


procedure generateSigMatchHistogram(filename: in string; c: in Candidate) is 
Eeemacda.text 10.file type, 
Fast rank: £loat; 
count: natural := 0; 
temp snp: SigMatchNodePtr; 


procedure putPair(the rank: float; the_count: natural) is 


begin 
add. tilGae tExXt 1O.pue(Et, the rank, 1, 2, 0); 
ada bext lon puc (he, | 9s )y 


ada.integer text_lo.put(ft, the count); 
BianeeXxculo. new sline (it), 
end putPair; 


begin 
Boadecext 10.Create(it, ada.text_1o.out_ file, filename) ¢ 


wiesig match node ptr set pkg.size(c.signature matches) = 0 then 
aga mee memo. close (ti )e 
return; 
end if; 
Eemp sip := sig match node ptr set pkg.fetch(c.signature matches, 1); 
last_rank := temp snp.signature rank; 


foreach((snp: SigMatchNodePtr), sig _match_node_ptr_set_pkg.scan, 
(c.signature matches), 
if snp.signature_rank /= last_rank then 
PUEPalr (last rank; count); 


VasStecanks:—esip.slgndture rank; 
Count 3— i; 

else 
Count <= count: + 1; 

end if; 


) 
Putra (last rank, count); 


ada text 10,Close(rt),; 
end generateSigMatchHistogram; 


a 


-- Procedure: addCandidate 
procedure addCandidate(c: in Candidate; cs: in out CandidateSequence) 
begin 
Candidate Sequence pkg. ada(c, cs); 
cs := candidateSequenceSort (cs); 
end addCandidate; 


_——_— 


-=  PbUnCElOn:s protiteskim (for CandidateSet) 
-- Description: filters out the candidates that do not meet the given 
-- profile threshold. 
MnctloOnapEorileskin(prorple threshold: in floar; 
the candidates: in CandidateSet) return CandidateSet is 
Peruri Val: scandtidabeset, 
begin 
retmrn Val): = candidate Set pkg.empty, 
foreach((c: Candidate), candidate set _pkg.scan, (the candidates), 
Vivcypeoullescank >— prorile Enheesnolda then 
Candidate set pkg.add(c, return val); 
ender, 
) 
Beli Gh Ge wil she vcl, 
end profileSkim; 


-- Procedure: generateProfileHistogram 

-- Description: generates histogram data of the profile ranks for the 
—— set of candidates and saves it to a file so it can be 
= read by a charting program. The format is one line 
== for each pair where the first item of the pair is the 
Soto profile rank and the second item is the number of 

So candidates with that rank. 


procedure generateProfileHistogram(filename: in string; 
Ene Candidates. cana1daleset ) ac 
PelmaGa). texcmlo.tlleme ypc, 
Past (rank: jfloat; 
count: natural := 0; 
temp, Candidate: Candidate: 


Procedure putPain (the rank: Elcat,; =the count: joetural) is 


begin 
ada, tloat texture. put(it athe rank, 1, = 27m). 
ada. C6xt 10.pUE (EE, 2), 


ada. integer text to.pue (ft, che count) > 
ada. tEXt Wo smewe line (Le): 
end putPair; 


begin 
adamtextnlOncreate (fewmadd texte 2o.0ut tile,” filename); 


1f candidate set pkg.size(the candidates) = 0 then 
dada text 10. close(ft); 
return; 

end if; 


temp_candidate := candidate _set_pkg.fetch(the_ candidates, 1); 
fast_rank := temp _candidate.profile rank; 
Eoreach|( (ec: Candidate), candidate set pkg ssean, (the candidates); 
it ec prorile rank /— Last ran. then 
pUEPadlet last: rank, count); 
lastorank = §C.prorile rank; 


GOune a =- 1? 
else 

COunia. — countess 5 
end if; 


) 
putPair(last rank, count); 


ada.text lo, close( tej, 
end generateProfileHistogram; 


Te 


is 





component_id_types.ads 


meee ee ee ee ew te we we cm we we we ewe re ee ee ew ee ee ae a ee 


-- Package Spec: component _id types 


eee cece ee ce ew ee ee ew we ee ce i ee ee ee ee we we we wee wee we ee ee ee ee we ee ee wee ee ee we we we iw 


With anat.10; 


with generic_map_pkg; 
with generic _set_pkg; 
With psdi concreLemeype pkgs use psdieconcrete type pkg, 


With psdleprors lepeuse psdliprorile, 
package component_id_types is 


— 


-- ComponentID 


subtype ComponentID is integer; 


procedure componentIDPut (c_id: ComponentID) ; 


-- Component 
-- Note: Make sure to use createComponent to instantiate a new Component. 
—— This will ensure that generics mapping is initialized. 
type Component is record 
psdl_ filename: text; 
Generics Mapping. GenericsMap, 
end record; 


function createComponent return Component; 


procedure addGenericsMapping(generic type _id: psdl id; 
actual type id: psd] id; the component:in out Component); 


function componentEqual(cl: in Component; c2: in Component) return boolean; 


Procedure comeonentPut (the component: in Component); 


-- ComponentIDMap 
package component _id map pkg is new generic_map_ pkg({ 
key => ComponentID, 
result => Component, 
eq_ key => "on : 
eq res => ComponentEqual, 
average size => 8); 
subtype ComponentIDMap is component_id_map pkg.map; 


procedure componentIDMapPut is new component_id_map_pkg.generic_put ( 
Key Puts sOndt. lower, ToS pul —=) COmpOnenePue);: 


-- ComponentIDSet 
package component_id set_pkg is new generic_set_pkg( 
=> ComponentID, 
average S326 =eac, 
CQ a 
subtype ComponentIDSet is component_id_set_pkg.set; 


procedure componentIDSetPut is 
new component_id_set_pkg.generic_put(put => gnat.io.put); 


procedure componentIDSetFilePut is 
new COMNpPONeENnt 1a setepkg generic T1Je put (put =— Component IDruLt), 


end component_id types; 
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component id_types.adb 


ey ee ce cry cee ey me ec ce ce cr rw a we a ee ee ewe ee oe ee ee ee ee ee ee ee a i ea ee 


me we we we wr em crm cr er re ee ee ee we we a ee a ee 


with gnat.io; 
with text_io; 


with psdl_ concrete_type pkg; use psdl_ concrete type pkg; 


package body component_id_types is 


== »Frocedure: componentIDPut 
procedure componentIDPut(c_id: ComponentID) is 
begin 
text_1o.put (integer 'image(c_id)); 
end componentIDPut; 


-~- Procedure: createComponent 

function createComponent return Component is 
return val; Component; 

begin 
generics map pkg.create(empty, return_val.generics_ mapping) ; 
Eeturn return val; 

end createComponent; 


-- Procedure: addGenericsMapping 
Precedure addGenericsMapping (generic type ads psdl id; 
actual type id: psdl_id- the component: in out Component) is 
begin 
generics map _pkg.bind(generic_ type id, actual _ type id, 
the Compenent.qenerics mapping) ; 
end addGenericsMapping; 


-- Function: componentEqual 
function componentEqual (cl: in Component; c2: in Component) return boolean is 
begin 
if not eq(cl.psdl_ filename, c2.psdl filename) then 
return false; 
end af; 


return generics _map_pkg.equal(cl.generics mapping, c2.generics mapping); 
end componentEqual; 


-- Procedure: componentPut 


procedure componentPut(the_ component: in Component) is 


begin 
gnat.io.put (convert (the_component.psdl_ filename) ); 
mide. iO.,pur(” |. “); 


genericsMapPut (the _component.generics mapping) ; 
end component Put; 


end component_id types; 
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haase diagram.ads 


ee ie ee ee ee cee cee cee cee eet mt mc ce ce ce re ee ee ee ee ee et ee ce ee ee ee ee ee we ee 


with generic_map_pkg; 


with profile types; use profile types; 
with component_id_ types; use component_id_types; 


package haase_ diagram is 


-- Types 
-- type HaaseNode is private; 
-- type HaaseDiagram is private; 


— 


-- HaaseNode 

type HaaseNode is record 
key: ComponentProfile; 
components: ComponentIDSet; 
children: ComponentProfileSet; 

end record; 


function haaseNodeEqual({hnl: in HaaseNode; hn2: in HaaseNode) 
return boolean; 


procedure haaseNodeAssign(hnl: in out HaaseNode; hn2: in HaaseNode); 
procedure haaseNodePut (the haase node: in HaaseNode) ; 


procedure haaseNodePrint (the _haase node: HaaseNode) ; 


-- HaaseDiagram 
package haase node map _ pkg is new generic map pkg ( 
key => ComponentProfile, 
result => HaaseNode, 
eq key => componentProfileEqual, 
equres —- shaasenoderqual, 
average Size => 8); . 
subtype HaaseDiagram is haase node map pkg.map; 


procedure haaseDiagramPut is new haase node map_pkg.generic_ put ( 
key_ put => componentProfilePut, res put => haaseNodePut) ; 


procedure haaseDiagramPrint(the_haase diagram: HaaseDiagram) ; 


procedure generateGML(the haase diagram: in HaaseDiagram; 
filename: in string); 


-- Operations 


function createHaaseNode(key: in ComponentProfile) return HaaseNode; 
function createHaaseDiagram return HaaseDiagram; 


procedure addComponent (the comp _id: in ComponentID; 
the_haase node: in out HaaseNode) ; 


procedure addChild(the child _ key: in ComponentProfile; 
the haase node: in out HaaseNode) ; 


procedure addHaaseNode(the_ haase node: in HaaseNode; 
the _haase diagram: in out HaaseDiagram); 


procedure addBaseNodes (the haase diagram: in out HaaseDiagram); 


procedure connectNodes (the haase diagram: in out HaaseDiagram); 
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em mes ce me ee ee ce cr ce wr ae i ee es er ee em is i es ce cm cr a ar we ee ee ce a ee es ee ee ee ee ee ee ee 


-- private 


end haase diagram; 
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haase diagram.g 


with text_io; use text_io; 
with generic_map_pkg; 


with profile types; use profile types; 

with component_id types; use component_id types; 
Wiehe psdl Protile;s use psd meprorile; 

with software base; 


package body haase diagram is 


-- Function: createHaaseNode 


-- Description: create and initialize a HaaseNode for use. 
function createHaaseNode (key: in ComponentProfile) return HaaseNode is 
return) vals Haasenode; 


begin 
profile id sequence pkg.assign(return_val.key, key); 
return, Valscompenents <— Component 1d set pkg-empty; 
return Val children := component prefile set pkg: empty; 


return return val; 
GndycredteHadasclode, 


-- Function: createHaaseDiagram 


-- Description: create and initialize a HaaseDiagram for use. 
function createHaaseDiagram return HaaseDiagram is 
begin 
return haase node _map_pkg:create({ 
createHaaseNode (profile id sequence pkg.empty) ); 
end createHaaseDiagram; 


-- Function: addComponent 


-- Description: add a ComponentID to the HaaseNode. 
procedure addComponent (the comp_id: in ComponentID; 
the haase_node: in out HaaseNode) is 
begin 
component_id_set_pkg.add(the_comp_id, the_haase_node.components) ; 
end addComponent; 


== Function: aadenrld 
-- Description: add a ComponentProfile that represents the 
a= key to a child HaaseNode to the HaaseNode. 
procedure addChild(the child key: in ComponentProfile; 
the haase_ node: in out HaaseNode) is 
begin 
component (profile set ipkq.add(the childvkey, the haase node. children); 
end addChild; 


-- Function: addHaaseNode 


-- Description: add a HaaseNode to the HaaseDiagram. 
procedure addHaaseNode(the haase node: in HaaseNode; 
the_haase diagram: in out HaaseDiagram) is 
temp_key: ComponentProfile; 
begin 
profile id sequence pkg.assign(temp key, the haase node. key); 
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haase node map_pkg.bind(temp_key, the_haase_ node, the _haase diagram); 
end addHaaseNode; 


-- Procedure: addBaseNodes 
-- Description: add base nodes for the nodes already in the diagram. 
-- This is done by adding a node for each profile in 


-- the key for each node in the diagram. Note, duplicates 
-- will not be added. 


procedure addBaseNodes(the haase diagram: in out HaaseDiagram) is 
new_diagram: HaaseDiagram; 
new_node: HaaseNode; 
new_key: ComponentProfile; 


begin 
new_diagram := createHaaseDiagram; 
haase node _map_pkg.assign(new_diagram, the haase diagram); 
new_key := profile_id sequence pkg.empty; 


-- for each( (node _ key: ComponentProfile; node: HaaseNode), 
=e haase node map _pkg.scan, (the_haase diagram), 


-- for each((p_id: ProfileID), profile id _ sequence pkg.scan, 
5 (node_key), 
foreach ((plid: ProrilerD), 
profile lookup table pkg.res_set_pkg.scan, 
(software base.getProfilelIDs), 
addProfileID(p_id, new key); 
if not haase_node map _pkg.member (new_key, the haase diagram) then 
new node := createHaaseNode (new _key); 
addHaaseNode(new_node, new diagram); 
end if; 
new_key := profile id sequence pkg.empty; 
) 
a) <) 


haase node map pkg.assign(the haase diagram, new_diagram); 
haase node map_pkg.recycle(new_diagram) ; 
end addBaseNodes; 


=- Procedure: connectNodes 

-- Description: connect nodes in diagram. Invariant: 

es newts DEesechi la 1tt subbag(nl.key, nz.key) and 

a there is no node n3 such that subbag(nl.key, n3.key) 
== and subbag(n3.key, n2.key). 


ee Note, an entirely new diagram is constructed because 
== Scan returns copies of the nodes in the haase diagram, 
== not the actual nodes. 


procedure connectNodes(the haase diagram: in out HaaseDiagram) is 
new node: HaaseNode; 
new diagram: HaaseDiagram; 
found_n3: boolean; 
begin 
new_diagram := createHaaseDiagram; 
foreach((nl key: ComponentProfile; nl: HaaseNode), 
haase node _ map pkg.scan, (the _haase diagram), 
new Rodew.— createnaasenode (nil key); 
haaseNodeAssign(new_node, nl); 


foreach((n2_ key: ComponentProfile; n2: HaaseNode), 
haase_ node _map_pkg.scan, (the_haase diagram), 
if not haaseNodeEqual (nl,n2) then 
if subbag(nl_key, n2_key) then 
found_n3 := false; 
foreach((n3_key: ComponentProfile; n3: HaaseNode), 
haase node_map_pkg.scan, (the_haase diagram), 
if net Eound n3 then 
if (not haaseNodeEqual(nl,n3)) and 
(not haaseNodeEqual(n2,n3)) then 
if subbag(nl_key, n3_key) and 
subbag(n3_key, n2_key) then 


wo 


found n3 := true; 
end if; 
ends, 
end if; 
) 
if not found_n3 then 
addChild(n2_key, new_node); 
end if; 
end if; 
end if; 
) 
addHaaseNode (new_node, new_diagram); 
) 
haase node map pkg.assign(the haase diagram, new diagram); 
haase node map pkg.recycle (new diagram) ; 
end connectNodes; 


-- Function: haaseNodeEqual 

-- Description: checks for equality of two haase nodes by 

-- comparing the keys. 

function haaseNodeEqual (hnl: in HaaseNode; hn2: in HaaseNode) 
return boolean is 

begin 
return componentProfileEqual(hnl.key, hn2.key); 

end haaseNodeEqual; 


-- Procedure: haaseNodeAssign 


-- Description: creates a duplicate of hn2. 
procedure haaseNodeAssign(hnl: in out HaaseNode; hn2: in HaaseNode) is 
begin 
profile id sequence pkg.assign(hnl.key, hn2.key); 
component_id_set_pkg.assign(hnl.components, hn2.components); 
= Gomponene  PEOElLle set pkq.assigqn(hnl cnutdren, hn2.children); 
end haaseNodeAssign; 


—~— 


=-—" Procedure: haaseNodePrut 
procedure haaseNodePut(the_haase_node: in HaaseNode) is 
begin 
componentProfilePut (the haase_ node.key); 
Put ce (sce, 
EOMPOnNeENnCIDset Pur tune naase node. Components); 
mute” |! oe 
componentProfileSetPut (the haase node.children); 
end haaseNodePut; 


-~ Procedure: haaseNodePrint 
procedure haaseNodePrint(the haase_ node: in HaaseNode) is 
begin 
DUG Key: . 
componentProfilePut (the _haase node.key); 
new line; 
put ("“Components: "); 
componentIDSetPut (the haase node.components) ; 
new_line; 
pute Cha ldren:™) > 
component ProfileSetPut (the haase node.children) ; 
new_line; 
end haaseNodePrint,; 


-- Procedure: haaseDiagramPrint 
procedure haaseDiagramPrint (the haase_ diagram: in HaaseDiagram) is 
begin 
foreach ((node_key: ComponentProfile; node: HaaseNode), 
haase_node_map pkg.scan, (the _haase diagram), 
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haaseNodePrint (node); 
new_line; 
) 
new line; 
end haaseDiagramPrint; 


—— Procedure: generateGML 
-- Description: generate a GML file to graphically represent the 
-- HaaseDiagram. 
procedure generateGML(the haase diagram: in HaaseDiagram; 
filename: in string) is 
ids natural =<=.0; =-sunigue ID counter 
Biom) Cl samciin lla, 
gml file: file_type; 


fUnGEION NEW 1d return natural 1s 
begin 

Gl er aetna 

return id; 
end new_id; 


package temp_map pkg is new generic map pkg( 
key => ComponentProfile, 
result => natural, 
eq key => componentProfileEqual, 
eq res => "at" : 
average size => 8); 
subtype tempMap is temp_map pkg.map; 


temp_map: tempMap; 


begin 
create(gml_file, out_file, filename); 
Dueroml tile, “graph [ 1d"); 
put(gml file, integer'image(new_id)); 
Pucmline (qmlecile,--sdirected 1"): 


temp_map_pkg.create(id, temp_map); 


—- make the nodes 
foreach((node_ key: ComponentProfile; node: HaaseNode), 
haase node map _pkg.scan, (the _haase diagram), 
pue (omiutile,)  nede [sid ”); 


the_id := new_id; J 
put(gml file, integer’image(the_id)); 
put(gml file, “" label """); 


Cemponenerrotilerilerut(gqmigi ile; snode: key); 

a Put colime (gmile fi low, 

-- componentIDSetFilePut(gml file, node.components) ; 
Putllinetgml file, 97") ")+ 


temp_map_pkg.bind(node.key, the_id, temp_map); 
) 


-~- make the edges 
foreach((node_key: ComponentProfile; node: HaaseNode), 
haase_ node map_pkg.scan, (the haase diagram), 
foreach((child_key: ComponentProfile), 
Component prezmle set pkg.scan,; (nede.children), 
puetomlytale, “edge | ad"); 
put(gml file, integer'image(new_id)); 
Baie (omit Lopesounes |"); 
put(gml file, integer'image(temp map pkg.fetch(temp_map, 
node.key))); 


put(gml file, " target "); 

put (qmipitle, integer image (temp map pkg.feceh (temp map, 
eni ld key) jj? 

Puceline(omiytiie; ~ |"): 


) 


put Jine(gml tale, |"); 
close(gml file); 
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temp_map pkg.recycle(temp map); 
end generateGML; 


end haase diagram; 
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profile calc.ads 


ee ee ee 


=— Package Spec: profile calc 


-- This package contains functions and types that support the computation 
-- of profiles from numeric representations of signatures. 


-- Description of numeric signatures: Positive integers represent 
-- instances of non-generic types in the signature. Negative integers 


~- represent instances of generic types in the signature. Finally, 
-- a 0 is used to terminate the array of integers representing the 
~- signature. 


~~ Examples of numeric signatures: 

=-— [integer, char, float -> integer) ==> [1,2,3,1,09] 
paeinteger, generic, float —> float] ==> [1,-1,2,3,0] 
-- [genericl, generic2 -> generic2] ==> [-1,-2,-2,0] 


-- Profiles are sequences of integers. 


-- Generic Types: 

-- Generic types cause more than one profile to be generated for a 

-- single signature. Hence, computeArrayProfileWithGenerics returns an 
-- array of ArrayProfiles, ProfileValues, bound by NumProfiles. 


-- ArrayProfiles are terminated with PROFILE TERMINATOR. For example, 
Seneca prorile (3,l,2,2] is returned as [3,1,1,2,-99). 


-- Eventually a different method for handling generic types will be 
-- employed and will likely do away with the ArrayProfile data type. 


with profile types; use profile types; 


package profile calc is 


== Types 

MAX SIG _ LENGTH: constant := 100; 

MAX PROFILE LENGTH: constant := 100; 

Paes PROFILE VARIATIONS: constant := 100; -- for generic types 
PROFILE TERMINATOR: constant := -99; 


subtype SignatureLengthRange is Positive range 1..MAX SIG LENGTH; 
subtype ProfileLengthRange is Positive range 1..MAX PROFILE LENGTH; 
subtype ProfileVariationRange is Positive range 1..MAX PROFILE VARIATIONS; 


type Signature is array (SignatureLengthRange) of Integer; 
type ArrayProfile is array (ProfileLengthRange) of Integer; 
type ArrayProfiles is array (ProfileVariationRange) of ArrayProfile; 


== Functions 
function computeProfile(T: in Signature) return Profile; 
function computeArrayProfile(T: in Signature) return ArrayProfile; 


-- note NumProfiles should be 0..MAX PROFILE VARIATIONS, not Natural 
procedure computeArrayProfileWithGenerics ( 

Poin Signature; 

ProfileValues: out ArrayProfiles; 

NumProfiles: out Natural); 


function printSignature(sig: Signature) return SignatureLengthRange; 
function printArrayProfile(prof: ArrayProfile) return ProfileLengthRange; 


ena profile calc; 
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profile _calc.g 


—— ee eet ee ee ee ee ee ee ee wee we wee we ww ww ww we ee ew we em ee wee ee ee ee we we wee ws we we we wee ee i ae ae a 


with gnat.10; use gnateia; 
with profile types; use profile types; 


package body profile calc is 


-- Function: convertToSequence 

-- Description: helper function to convert an ArrayProfile (an 
== array of ints terminated with PROFILE TERMINATOR) 
=-= to a Profile (a sequence of ints). 


function convertToSequence (Prof: ArrayProfile) return Profile is 
return val: Profile; 
i, count: ProfileLengthRange; 


begin 
count .:= 1; 
while Prof(count) /= PROFILE TERMINATOR and count <= MAX PROFILE LENGTH loop 
eount s="counte 1; 
end loop; 
SOuUntae= COUNt (= 1)? 


return val := 0; 
POYre lite lL. .cOunte Loop 
return val := return val + (long long integer(Prof({i)}) * 
CEOe a eCOunt 1) 7 
end loop; 


return return val; 
end convertToSequence; 


function printSignature(Sig: Signature) return SignatureLengthRange is 
Num: SignatureLengthRange; 


begin 
Num := 1; 
Pure ['"); 


while Sig(Num + 1) /= 0 loop 
Put | (Ssigi( Num) ); 
if Sig(Num + 2) /= 0 then 
Ee ae eee) 
end if; 
Num := Num + 1; 
end loop; 
PU Ce ae ce 
Put (Sig (Num) 4? 
Put("] ie 
return Num; 
end printSignature; 


function printArrayProfile(Prof: ArrayProfile) return ProfileLengthRange is 
Num: ProfileLengthRange; 


begin 
Num := 1; 
Pie a. 


while Prof(Num) /= PROFILE TERMINATOR and Num < MAX PROFILE LENGTH loop 
Put (Prof (Num) ); 
Lf > Pror(Num + 1) /= PROFILE TERMINATOR then 
Plea. an oe 
end if; 
Num := Num + 1; 
end loop; 
PVt (eh, 
return Num; 
end printArrayProfile; 


function computeProfile(T: Signature) return Profile is 
begin 

return convertToSequence (computeArrayProfile(T) ); 
end computeProfile; 
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function computeArrayProfile(T: Signature) return ArrayProfile is 
Result: ArrayProfile; 
Result Count < Integer, 
NumResSort: Integer; 
NumOneSorts: Integer; 
PU eeincede Fr ; 
L: SignatureLengthRange; 
SortValues: array (SignatureLengthRange) of Integer; 
SortNums: array (SignatureLengthRange) of Integer; 
NURoCOGES. integer, 
Found: Boolean; 


begin 
=-—- Compute Profile[l], Total Number of Sorts. 
Peourercount. .— 1; 
J 2:= 0; 


-- set L to number of elements in T 
-- note, this is the first number in the profile 


ies Ls 
while (T(I) /= 0 and I <= MAX_SIG_LENGTH) loop 
toe jor dy 
end loop; 
L <= 1-1; 
Result (Result Count) i= 4L; 


-- Compute Profile[2], Number of Times Result Sort in Signature. 

-- note, Nguyen's thesis just uses 0 or 1 to indicate if the 

-- result sort is used in the input arguments. Representing 

-- the number of times the result sort is used is finer resolution, 
-- which should partition of the software base better. 

NumResSort := QO; 

Popel ian i, .b loop 


fel) = 7 (1) then 
NumResSort := NumResSort + 1; 

end if; 
end loop; 
Best! tecount  :— xesult Count + 1; 
|=) Herman 
eaehesutt(ResultsCount) .= NumResSort; 
ao NdgUYyen 


if NumResSort > 1 then 


BResule(Reswir COunE) . — 1; 

else : 
ResuletKesule Count)@.— 0; 

mcdeii; 


-- Herman Improvement Profile[3] 

-- Add the number of occurrences of the type being defined by the 
-- component (if the component is a type). 

eeocoult Count := Result Count + 1; 

mee suULe (Result Count) s= T(Lt2); 


-- Herman Improvement Profile[4..8] 
~- Add the number of occurrences of types in the basic sort groups 
Becult Count := Result Count + 1; 
Peoule (Result Count)y:= T(L+3); 
Result Count <= Result Count + 1; 
Result (Result Count) := T(L+t4); 
Sohesult Count *= Result Count + 1; 
meecoULe(RESUlLE ;Count) s= T(Lt+5); 
Pesulte Count s— ResulteCount + 1; 
Bestikt (Result Count) .= T(iL+6) 7 
RooulceCOUntLs.— Result Count + 1; 
Resule(Resule Count) == 1(b+7/); 


-- Generate Helper Arrays 

-~ SortValues: an ordered SET of sort values 

= e.g. 1f the Signature input T was [{1, 1, Z, 1, 0] 

ae SortValues would be [l, 2] 

-- NumSorts: the cardinality of the ordered set SortValues 
= e.g. in the above example, NumSorts would be 2 

-- SortNums: the cardinality of each sort in SortValues 
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= e.g. in the above example, SortValues would be [3, 1] 
for Ieriet...b loop 


SortNums (I) := 0; 
end loop; 
SortValuesi(1) := Til); 
NumSorts := 1; 


SortNums(1) := 1; 
foro l ainez.. tb leop 
Found := False; 
for in -l..Numsoerts loop 
if T(I) = SortValues(J) then 


SortNums(J) := SortNums(J) + 1; 
Found ==" True; 
end if; 


end loop; 
if not Found then 


NumSorts := NumSorts + 1; 
SortValues(NumSorts) := T(I); 
SortNums (NumSorts) := 1; 

end if; 


end loop; 


-- Becomes Profile[9] 

-- Compute Profile[(3], Number of Sort Groups of Size One. 
NumOneSorts := 0; 

LOG etnies. .NUMSOLES™ oop 


Pe oOreN ams (1) = 1 then 
NumOneSorts := NumOneSorts + 1; 
end if; 
end loop; 
Result Count := Result Count + 1; 
Result (Result Count) := NumOneSorts; 


-- Becomes Profile[10..N] 


-- Compute Profile[(4..N], Sequence of Sizes of the Sort Groups that 


-- Have Size Greater than One. 
fetes neoeb-2. loop 
fon mw inel=.Numserts loop 


if SortNums (J) = L-I then 
LeStiESeOuMem —  RESUIC Comme. +) I; 
BesuvEyhesuit count) =:—  b>i; 

end if; 


end loop; 
end loop; 


-- Terminate the ArrayProfile 
Result (Result Counts) :—= PROFILE TERMINATOR, 
return Result; 

end computeArrayProfile; 


procedure computeArrayProfileWithGenerics ( 

Ty. in Signature; 

ProfileValues: out ArrayProfiles; 

NumProfiles: out Natural) is 

tpeGw ow. Ks: Integer; 

L: SignatureLengthRange; 

NewSig: Signature; 

NumGenerics: Integer; 

NumDiffGenerics: Integer; 

Found: Boolean; 

Valj: Integer; 

GenericPos: array (SignatureLengthRange) of Integer; 
ProfileVal: ArrayProfile; 
begin 
NumGenerics 
NumProfiles 
Valj:=0; 
NumDiffGenerics := 0; 
G := 0; 

J := QO; 

c= sy? 


0; 
0; 


XN 
Il 


-- set L to number of elements in T 

tees. 1 

while (T(I) /= 0 and I <= MAX SIG LENGTH) loop 
I := I+1; 
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end loop; 
fos t= 


motes lee bh le 1b OOP 
Pe Tilie< © then 
if T(I) < NumDiffGenerics then 
NumDiffGenerics := T(I); 
end if; 
NumGenerics := NumGenerics + 1; 
GenericPos(NumGenerics) := I; 


end if; 
end loop; 
NumDiffGenerics := -l * NumDiffGenerics ; 
if NumGenerics = O then 
NumProfiles := 1; 
ProfileVal := computeArrayProfile(T); 
ProfileValues(1) := ProfileVal; 
else 


for G in 1..NumDiffGenerics loop 
for tein 1... boop 
NewSig(I) := T(I); 
end loop; 
NewSig(L+1) := 0; 
ford an Laeks loop 
for I in 1..NumGenerics loop 


if T(GenericPos(I)) >= -l1 * G then 
NewSig(GenericPos(I)) := T(J); 
end if; 
end loop; 


-- These following lines are good for debugging. 
-- They print out all the combinations of signatures computed 
Valj:= printSignature (NewSig) ; 
New_Line; 
ProfileVal := computeArrayProfile(NewSig) ; 
if NumProfiles = 0 then 
NumProfiles := 1; 
ProfileValues(1) := ProfileVal; 
else 
Found := False; 
for K in 1..NumProfiles loop 
if ProfileValues(K) = ProfileVal then 
Founda := True; 
end if; 
end loop; 
if Heweround then 
NumProfiles := NumProfiles + 1; 
ProfileValues (NumProfiles) := ProfileVal; 
end if; 
Sci f, 
end loop; 
end loop; 
end if; 
end computeArrayProfileWithGenerics; 


end profile calc; 
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profile filter pkg.ads 


a Ne A ce le ee nnn, ee 


mm ce ce re cr me re we ar rs cre ws re ere ar ee ee es ee we ee eee ees ae 


with haase_ diagram; use haase diagram; 
with candidate types; use candidate types; 
with profile types; use profile types; 
package profile filter pkg is 


function findCandidates (query profile: in ComponentProfile; 
the haase diagram: in HaaseDiagram) return CandidateSet; 


end profile stitter pkg, 
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profile filter pkg.g 


ce ce eee ee ee ew we we we we we we te ee wc ce ee ee ee we te we ee ee eee ee ee ee ee 


with haase diagram; use haase_ diagram; 
with candidate types; use candidate_types; 
with component_id_ types; use component_id_ types; 


package body profile filter pkg is 


-- Function: findCandidates 

-- Description: for each profile in query profile start at the base-node 
=-- that represents that profile and perform a depth-first 
-- search on the haase-diagram. At each node calculate the 
-- profile rank, create a Candidate with that rank and the 
aS components in that node, and add it to return val. 


function findCandidates(query profile: in ComponentProfile; 
the haase diagram: in HaaseDiagram) return CandidateSet is 
meturn val: CandidateSet; 
base node: HaaseNode; 
base node key: ComponentProfile; 
hum matches: natural; 
ig. Matural; 


procedure DFSFW(hn: in HaaseNode) is 
temp candidate: Candidate; 
begin 
== COURL the Mumber Of profiles in the node that 
sas oane also in the query 
num matches := 0; 
ieee 3 
pec = 1; 
while i <= profile id_sequence pkg.length(query profile) and 
Jen— pProrile id ysequence pkg.length(hn.key) loop 
Peepeomrelowicmcedlence pkq.tetch (query profile, 1) = 
profile id sequence _pkg.fetch(hn.key, j) then 
hum matches := num_matches + 1; 
1s eae 4; 
7 <= 7 + 17 
elsif profileIDLessThan (profile id sequence pkg.fetch(query profile, i), 
profile id sequence pkg.fetch(hn.key, j)) then 


sale 
else 

eS ae Pe 
end if; 


end loop; 


-- add the node's components to return val 
foreach ((comp_id: ComponentID), component_id_set_pkg.scan, 
(hn.components), 
Lemp candidate := newCandidate; 
temp ycandidate-profile rank *3= 
float(num_ matches) / float(profile_id_ sequence pkg.length(query profile) ); 
temp candidate.component_id := comp_id; 
candidate set _pkg.add(temp_candidate, return_val); 
) 


-- recursively call DFSFW on each child 
foreach((child: ComponentProfile), component_profile set_pkg.scan, 
(hnwehi ldrern):, 
DFSFW(haase node map pkg.fetch(the_haase diagram, child)); 
) 
end DFSFW; 


begin 
return val += candidate set pkg.empty, 


foreach((p id: ProfileID), profile_id_ sequence _pkg.scan, (query profile), 


base node key := profile 1d sequence pkg.empty; 
a0CPeotilctp (pera, base node key); 
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if haase node map pkg.member (base node key, the haase diagram) then 
base node := = — 
haase_node_map_pkg.fetch(the haase diagram, base _node key); 
DFSFW(base_ node); 
end if; 
) 


return return_val; 
end findCandidates; 


end profile filter pkg, 
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profile types.ads 


mm a a wwe we ww ww we we ae ae ie es ei ee ee ee 


with gnat.io; 


with generic_sequence_pkg; 
with generic _set_pkg; 
with ordered _map_pkg; 


package profile _ types is 


procedure myIntPut(i: integer); 


-- Profile 


-- package int_sequence pkg is new generic sequence pkg ( 
=—- t => integer, average size => 4); 
-- subtype Profile is int_sequence pkg.sequence; 


== function profileEqual is new int_sequence_pkg.generic equal(eq => "="); 
seeeunction profilebLessThan 1s new int _ sequence pkg.generic less than("<" => "<"); 
maneerocedure profilePut 1s new int_sequence pkg.generic put(put => gnat.io.put); 
pemprocedure profilerFilePut is mew int sequence pkg.generic put (put => myIntPut); 


subtype Profile is long_long_ integer; 


function profileEqual(pl, p2: Profile) return boolean; 
function profileLessThan(pl, p2: Profile) return boolean; 
procedure profilePut(p: Profile); 

procedure profileFilePut(p: Profile); 


~- ProfileID 


subtype ProfileID is integer; 


function profileIDLessThan(pl, p2: ProfileID) return boolean; 
procedure profileIDPut(p_ id: ProfileID); 
procedure profileIDFilePut(p_id: ProfileID); 


-- ProfileLookupTable 
DEFAULT PROFILE ID: constant := -l; 
package profile lookup table pkg is new ordered _map_pkg( 
key => Profile, 
result => ProfilelID, 
fPameey —> profilerqual, 
eq res => "in" , 
"<" => profileLessThan) ; 
subtype ProfileLookupTable is profile _lookup_table pkg.map; 


procedure profileLookupTablePut is new profile lookup table pkg.generic put ( 
Key put => profilePut, res put => profileIDPut); 


-- ComponentProfile 

-- Note: should use addProfileID to add a profile id to the ComponentProfile. 
a addProfileID keeps the ComponentProfile sorted which is important 
ao for equality and subbag (multiset subset) testing. 


package profile id sequence pkg is new generic sequence pkg ( 
t => ProfileID, average size => 4); 
subtype ComponentProfile is profile id sequence _ pkg.sequence; 


function componentProfileEqual is 
new profile id sequence pkg.generic equal(eq => "“="); 


function componentProfileMember is 
new profile _id sequence _pkg.generic_member(eq => "="); 


il 


procedure componentProfileRemove is 
néw profile ad sequence pkq generic remove (eq => "="); 


function componentProfileSort is 
new protile id sequencerprg, Jeneriessore( <" —> "<")7 


function componentProfileLessThan is 
néw profile wid sequence pkg. generic less than(”~<" => prorilelDuessThan); 


procedure componentProfilePut is 
Mew proti lend sequence pro -generic put (puG == prori lel DPut):, 


procedure componentProfileFilePut is 
new profile id sequence pkg.generic file put(put => profileIDFilePut) ; 


function subbag is 
new profile id sequence pkg.generic_subsequence(eq => "="); 


package Component eprorttersct PKG 1S 9dew gencere Scmpng( 
t => ComponentProfile, eq => componentProfileEqual, average size => 8); 
subtype ComponentProtileset 1s component protile ser ipkg-seu, 


procedure componentProfileSetPut is 
new component profile set pkg.generic put(put => componentProfilePut) ; 


procedure addProtilleIpi(p ids in Profiler); cp: in cut Component Profile); 
procedure addPrortilesinew proriles in Component rroti le; 
target: in out ComponentProfile); 


ence protest ypes, 


a2 


profile _types.adb 


em ccc cre cr cr we a cr we we crm cr cre ar ar i a es es we wr ee ee ee ee ee ee ee oe oo 


Te ee 


with text_io; 
with ada.long_long_integer text_io; 
with software base; 


package body profile types is 


-- Procedure: myIntPut 


procedure myIntPut(i: integer) is 
begin 

text _io.put (integer'image(i)); 
end myIntPut; 


-- Procedure: addProfileID 


-- Description: adds a ProfileID to a ComponentProfile by adding the 
-- ProfileID to the sequence then sorting the sequence. 


procedure addProfileID(p_id: in ProfileID; cp: in out ComponentProfile) 


begin 
profile id sequence pkg.add(p id, cp); 
cp := componentProfileSort (cp); 


end addProfilelID; 


-- Procedure: addProfiles 
-- Description: appends the profiles from new profiles to target then 
== sorts target. 
procedure addProfiles(new_ profiles: in ComponentProfile; 
target: in out ComponentProfile) is 
begin 
target profile id sequence pkg.append(target, new_profiles); 
target := componentProfileSort (target); 
end addProfiles; 


=——- Function: profileEqual 
function profileEqual(pl, p2: Profile) return boolean is 
begin 
return pl = p2; 
end profileEqual; 


-- Function: profileLessThan 
function profileLessThan(pl, p2: Profile) return boolean is 
begin 
return ple. pZ; 
end profileLessThan; 


S——rUNclLion: proLilePut 
procedure profilePut(p: Profile) is 
begin 
ada.long long_integer_text_io.put(p,0); 
end profilePut; 


-- UunChEIon: iprotileFrilePut 


procedure profileFilePut(p: Profile) is 
begin 
profilePut(p); 


 , 


end profileFilePut; 


-- Function: profileIDLessThan 
function profileIDLessThan(pl, p2: ProfileID) return boolean is 
begin 
Peturnecol tCware base cerrProtile(pl) < sottware base GetProrilée (p2)7 
end profileIDLessThan; 


~-—— 


~- Procedure: profileIDPut 
procedure prot mle Deut (peta. FProfrlelp) sis 
begin 
Lext 16 7 puibiinceger  1mage(p 1d) }; 
end profileIDPut; 


-- Function: profileIDFilePut 
PLocedire (pret wel MralePut (pord: |) Protilery) ers 
begin 
Profi lelDPuri(p 1a); 
end profileIDFilePut; 


-- Function: createProfileLookupTable 


function createProfileLookupTable return ProfileLookupTable is 
begin 

return prestle lookup table pkg-creare(0); 
end createProfileLookupTable; 


endeprort lent ypes; 
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psdl_profile.ads 


i er ee i i i i i i I I ne 


-- Package Spec: psdl profile 


— 


-~ This package contains functions and types that support the collection 
-~- of operation profiles from a component specified in PSDL. 


Oe a iia diane Ee eee 


with generic sequence pkg; 
with generic_map_pkg; 
with generic_set_pkg; 
with ordered set_pkg; 


with psdl _concrete_type pkg; use psdl_concrete type pkg; 
with psdl_component_pkg; use psdl_component_pkg; 


with profile types; use profile types; 
package psdl profile is 


— 


== lypes 


_—— 


-- OpWithProfile 


type OpWithProfile is record 
op: operator; 
op profile: ProfileID; 
end record; 


function opWithProfileEqual(owpl: in OpWithProfile; owp2: in OpWithProfile) 
return boolean; 


function opWithProfileLessThan(owpl: in OpWithProfile; owp2: in OpWithProfile) 
return boolean; 


procedure opWithProfilePut(owp: in OpWithProfile); 


-- OpWithProfileSeq 


-- Note: should use addOpWithProfile to add an OpWithProfile to the sequence. 
ae addOpWithProfile keeps the sequence sorted. 


package owp_ sequence pkg is new generic sequence pkg( 
t => OpWithProfile, average_size => 4); 
subtype OpWithProfileSeq is owp_sequence_pkg.sequence; 


function opWithProfileSeqEqual is 
new owp_sequence pkg.generic_equal(eq => opWithProfileEqual); 


function opWithProfileSeqMember is 
new owp_ sequence pkg.generic_ member(eq => opWithProfileEqual); 


procedure opWithProfileSeqRemove is 
new owp_ sequence pkg.generic_remove(eq => opWithProfileEqual) ; 


function opWithProfileSeqSort is 
new owp sequence pkg.generic sort("<" => opWithProfileLessThan) ; 


procedure opWithProfileSeqPut is 
new owp sequence pkg.generic put(put => opWithProfilePut); 


procedure opWithProfileSeqPrint(owp seq: in OpWithProfileSeq); 


procedure addOpWithProfile(owp: in OpWithProfile; 
Oowp_seq: in out OpWithProfileSeq); 


-- OpWithProfileSet 


package owp_set_pkg is new ordered set_pkg( 
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t => OpWithProfile, eq => opWithProfileEqual, 
"<" => opWithProfileLessThan) ; 
subtype OpWithProfileSet is owp_set_pkg.set; 


procedure opWithProfileSetPut is 
new owp_set_pkg.generic_put(put => opWithProfilePut) ; 


procedure opWithProfileSetPrint(owp set: in OpWithProfileSet); 


-- GenericsMap 


-- Description: this is a mapping of generic type identifiers to 
-- actual types that exist in the component. For example, if the 
-- PSDL type Stack has one generic type named Item and has methods 
~- that have parameters that use the types natural, Stack, and 

-- boolean then there would be four different instantiations of 

-- Stack in the software base representing the four possible 

=——- Mappings for item: ~~). item => natural, 2: Item => Stack, 

-- 3. Item => boolean, 4. Item => Item. Option 4 really just 

-- means that Item is mapped to a type that does not appear in the 
-- component. Suppose Stack used two generic types. In that case 
-- each instantiation's GenericsMap would have two entries, one 

-- for each generic type. In such a case the number of different 
-- instantiations present in the software base grows rapidly; 

-- specifically the number would be the cross product of the number 
-- of types across each generic type. 


package generics _map pkg is new generic _map_pkg( 
key => psdl_id, 
result => psdl_id, 
eq key => eq, 
eq res => eq, 
average size => 8); 
subtype GenericsMap is generics_map_pkg.map; 


procedure psdl sidPut(the id: in psdl id); 


procedure genericsMapPut is new generics map pkg.generic_ put ( 
key put => psdl_idPut, res put => psdl_idPut); 


-- GenericsMapSet 


package generics_map_set_pkg is new generic_set_pkg( 
t => GenericsMap, eq => generics map pkg.equal); 
subtype GenericsMapSet 1s generics_map_set_pkg.set; 


procedure genericsMapSetPut is 
New Generics Map useusp Mg. Gener leupue(SULe— = =denerkestiapeuc 7, 


—m- 


-- Functions 


function getGenericsMaps (filename: in string) return GenericsMapSet; 


function getComponentProfile(filename: in string; 
generics mapping: in GenericsMap) return ComponentProfile; 


function getOpsWithProfiles(filename: in string; 
Generives Mapping: in GenericsMap) return OpwithProti1leSeq, 


function getOpsWithProfiles(filename: in string; 
generics mapping: in GenericsMap) return OpWithProfileSet; 


ends psdilprorvle; 
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psdl_profile.g 


with text_io; use text_io; 


With profile types; use profile types; 
Wich epEGutleycale, muses protlles calc; 


with psdl_io; 

with psdl_ concrete type pkg; use psdl_concrete type pkg; 
with psdl_component_ pkg; use psdl_component_pkg; 

with psdl_program_pkg; use psdl_program_pkg; 

with psdl_id set subtype pkg; 

with psdl_id_ pkg; 

with software base; 


with generic map pkg; 
with generic sequence _pkg; 


package body psdl profile is 


package signature _ seq pkg 1S new generic sequence pkg ( 
t => Signature, average size => 2); 
subtype SignatureSequence is Signature _seq pkg.sequence; 


-- Function: opWithProfileEqual 


function opWithProfileEqual(owpl: in OpWithProfile; owp2: in OpWithProfile) 


return boolean is 
begin 
-- if not profileEqual(owpl.op profile, owp2.op profile) then 
DEVGwpl op profile /—= owpZ cp pretile then 
return false; 
end if; 
return eq(owpl.op, owp2.op); 
end opWithProfileEqual; 


-- Function: opWithProfileLessThan 
function opWithProfileLessThan(owpl: in OpWithProfile; 
Owp2: in OpWithProfile) 
return boolean is 
begin 
== return prorileLessthan(ewpl sop pretile, owp2.op profile); 
return profileLessThan(software base.getProfile(owpl.op profile), 
software base.getProfile(owp2.op_ profile) ); 
end opWithProfileLessThan; 


== sruncti1on: OpWwithProfilePut 
procedure opWithProfilePut(owp: in OpWithProfile) is 
begin 
Duet ("); 
put (convert (name (owp.op))); 
DUE (ge ee 
foreach((the id: psdl id; the tn: type_name), 
type declaration pkg.scan, (inputs(owp.op)), 
put (convert (the_tn.name) ); 
put (" aye 
) 
put ("=> Ny 
foreach ( (themida: psdl id? the tn: type name), 
type declaration pkg.scan, (outputs(owp.op)), 
BUE (Convert (the tEn.name) )7 


putt” “ye 
) 

Pie a pales 

profilePut (software base.getProfile(owp.op profile) ); 
Buc C yo. 


end opWithProfilePut; 
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-- Function: opWithProfileSeqPrint 
procedure opWithProfileSeqPrint (owp_ seq: in OpWithProfileSeq) is 
begin 
foreach ((owp: OpWithProfile), owp_sequence_pkg.scan, (owp_seq), 
put (convert (name (owp.op) )); 
putt: Mes 
Foreaci((thesid:. psdlvid; tne tn: type name), 
type _ declaration pkg.scan, (inputs(owp.op)), 
PUL TEONVeErEt tne tnaname)) 7 
ite ee 
) 
pub => Ne 
EOuedeh (Cl Eneaid:=psd mid, thesis type Namely 
type declaration pkg.scan, (outputs (owp-.op)), 
put (convert (the_tn.name)); 


put (" er 

) 

pue ee 

plorilerut (son tware base, decerbor! Le (owpnop plLOrl ley je 
new_line; 


) 
end opWithProfileSeqgPrint; 


-- Function: opWithProfileSetPrint 
procedure opWithProfileSetPrint (owp_set: in OpWithProfileSet) is 
begin 
foreach((owp: OpWithProfile), owp_set_pkg.scan, (owp_set), 
put (convert (name (owp.op) ) ); 
DUE 27 Je 
LOreacn ((tnerid-. psd) id; thevtn: type name), 
type declaration pkg.scan, (inputs (owp.op)), 
put (convert (the _tn.name) ); 
put (7 ae 
) 
put ("-> wyce 
EQURCAGMQ(ENeCr 1d.) psGlmid; ELNeutn. “Eype mame), 
type declaration pkg.scan, (outputs (owp.op)), 
BUL (Convert { tne ta oname) )7 
put (7 ae 
) 
neweliney 
profilePut (software base.getProfile(owp.op_profile) ); 
new line; 
) 
end opWithProfileSetPrint; 


-- Function: addOpWithProfile 
procedure addOpWithProfile(owp: in OpWithProfile; 
OVpe Seg: siecle Open eroraleScq)meis 


begin 
Oowp_sequence pkg.add(owp, owp_ seq); 
owp_seq := opWithProfileSeqSort(owp_ seq); 


end addOpWithProfile; 


~- Function: createNumericSignatures 
-~- Description: helper function to create numeric signatures for 
== an Operator. 
function createNumericSignatures (op: in operator; 
generics mapping: GenericsMap; type id: psdi id) 
return SignatureSequence is 


package type map pkg is 
new generic map pkg({ 
key => type name, 
result => integer, 


98 


eq key => equal, 
SQetesg— aaa, 
average size => 2); 
subtype type_map is type_map_pkg.map; 


-- if a type from the same sort group is already in the map 
=~ ULnen TreCurpmEnc number that represents that sort group 
~~ otherwise return 0, indicating this a type from a new 
== SOvre group 
function getSortGroupNum(the_type: type name; 
the_type_map: type _ map) return integer is 
return val. integer; 
begin 
return_val := 0; 
foreach((the_tn: type name; the num: integer), 
type map pkg.scan, (the type map), 
t= Same Sore group (the type, the tn) then 
return val := the num; 
-- TODO: should be exit loop here but don't know how to 
end if; 
) 
peturn return val; 
end getSortGroupNum; 


Ene vinpbuts- type declaration := inputs (op); 
Phesoutputs st ype Geclaration s— outputs (op); 
Ehe type map: type map; 

i; ti natural; 

SOre Groupenum: Integer; 

gen_set: psdl id set_ subtype pkg.psdl_ id set; 
temp signature: Signature; 

temp_tn: type name; 

Peturn val: Signaturesequence; 

by pe occurrence count: natural, 

boolcount, Char lecount, String count, int count, float count: natural; 


procedure update_additional counts(the_tn: type name) is 
begin 
Tf eq(Cemp tne name, type ic), then 
LYpe OCeurrence Count j= type occurrence count + 1; 
SlSttyoamencere group (the tn,) boolean type) then 
bool count -= bool count + 1; 
slsitesdmew=ort Group(the tn, character type) then 
char count := char count + 1; 
ieee mooreEgeOup (the th, String type) then 
: Ser ingecount string count + I; 
clsitivcamemsomengroup (the tn, integer tyoe)m then 
IiEm@ecOuUnta.— Intacount +. 1; 
Elise scdmensontagqroup the tn, floaty type) then 
float ecoune: ~stlodtscount. + 1; 
Ena tt, 
end; 


begin 
byeeumapepkd. create (0, Lhe type map); 


== £Or Cach OULCDUEL 
Borecdch( (OmGsmpsal id, o tn: type name), 
type declaration pkg.scany (the outputs), 


EVPeymMapepkg. recycle(the type map)? 


cana O, 

dp te 10); 

type Occurrence count := 0; 
bool count := 0; 

char Count. ;— 0; 
Seri ig eeoulins— Ul; 

int _count := 0; 
floteecounc <— 0; 


~~ for each input 
foreaci(iinid: psdl id; i tn: type name), 
type declaration_pkg.scan, (the_inputs), 


-~- check if type is a generic type or a regular type 
if generics_map _pkg.member(i_tn.name, generics mapping) then 


a 


EGMp ene: —=serecarcc 
generics map pkg.fetch(generics mapping, i _tn.name), 
psdl_id_sequence_pkg.empty, 
type declaration_pkg.create(null type) ); 
else 
-- could probably use i_tn as is rather than create 
-- a copy but we're being safe in case i_tn has some 
=a besidlewine tts Tormals and gen pars 
EemMpethie.=mcLleacte (Leth oname, 
psdl_id sequence_pkg.empty, 
type declaration pkg.create(null type) ); 
end) 1£; 


update additional counts(temp_tn); 


-- if the type isn't in the map yet then put it in 
if not type map _pkg.member(temp tn, the_type map) then 


sort_group_num := getSortGroupNum(temp_tn, the_type_map); 
if sort_group_num = 0 then 
tees eet wk 
type map _pkg.bind(temp tn, t, the _ type map); 
end if; 
Senger: 


-- add the input's sort group number 

1) =e 

temp_Signature(i) := getSortGroupNum(temp_tn, the _type map); 
) 


-~- handle the output 


-- check if type is a generic type or a regular type 
if generics_map_pkg.member(o_tn.name, generics mapping) then 
CeMp ete ee tea c ( 
generics map pkg. fetch(generics mapping, o_tn.name), 
psdl_id sequence _pkg.empty, 
type declaration _pkg.create(null type)); 
else 
= COuld probably wse Oo tn as is rather than create 
-- a copy but we're being safe in case o_ tn has some 
== LeSsiaguevineits 2£OErmals and gen pdrs 
temp the: = create (OCltn-name, 
psdl_id_sequence_pkg.empty, 
type_declaration_pkg.create (null _type)); 
end if; 


update _additional_counts(temp_tn); 


-- if the type isn't in the map yet then put it in 
if not type_map pkg.member(temp_tn, the_type_ map) then 


sort_group_num := getSortGroupNum(temp_tn, the_type_ map); 
if sort_group_num = 0 then 
eee eee 
type _ map pkg.bind(temp tn, t, the_type map); 
end if; 
end if; 


== add ENG Output's sort group number 

nt wii 

temp signature(i) := getSortGroupNum(temp_tn, the_type_map); 
-~ mark end of signature 

1 f= tele 

temp signarure (1) =:— 90; 

-- add the type occurrence count to the signature 

Sei 

temp_Signature(i) := type_occurrence_count; 


~- add basic type counts in 


Dees Ly 
EeMpesignature(!) := bool count, 
i et 1S 


temp Signature (1) 
1 Se eet eee 
Eemp Signature (1) 


char_count; 


string count; 
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iS ly ae, 
ECMpmst ona kuretd) 
1. i= 1 tia 
temp signature (i) 


int_count; 


float count; 


y= 1) + -1> 
temp signature (i) 


Oe 


-~ add the signature to the sequence of signatures 
Signature seq pkg.edd(temp signature, return val); 


) 


return returaival, 
end createNumericSignatures; 


-- Function: getOperatorProfiles 


~~ Description: helper function to collect the profiles for 

a an operator. A ComponentProfile (sequence of 
—— profiles) is used because if an operator has 
-- more than one output it is treated as if there 
-—- is a separate operator for each output. 


function getOperatorProfiles(op: operator; 
generics Mapping: in GenericsMap; type_id: psdl_id) 
return ComponentProfile is 


Eeturn. val: sComponentrrorr le; 
numeric sigs: SignatureSequence; 


begin 
-- convert the operator's signature to numeric signatures 
sausee the Conments in the Specification of profile calc) 
numeric sigs := createNumericSignatures(op, generics mapping, type id); 


-- compute the profile for each signature 

Bokeach ( (sid molanalLure), signature seq pkg.scan, (numericusigs), 
addProfileID(software base.getProfileID(computeProfile(sig)), 
Return Val); 


) 


teturn return val; 
end getOperatorProfiles; 


-- Function: getComponentProfile 


-- Description: this function will return the ComponentProfile 
aia for a component specified in PSDL in the PSDL 
= file filename. 


function getComponentProfile(filename: in string; 
generics mapping: in GenericsMap) return ComponentProfile is 


the file: file type; 
the prog: psdl_ program; 
return val: ComponentProfile; 


begin 
-- parse the psdl file to create a psdl_ program 
open(the file, IN FILE, filename); 
assign(the prog, psdl_program_pkg.empty psdl_ program); 
psd! io.get(the file, the prog); 
close(the file); 


-- if the program contains more than one component 
-- then just get the first one since the program 
-- is only supposed to have one (a requirement of 
-- this implementation) 
foreach ((c 1d: psdieid; cc: psdl component), 

psdl program_map_pkg.scan, (the_prog), 


-- if the component is a single operator then just 
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-- get the profile for that operator 
if component category (<) = psdl operatereenen 
addProfiles(getOperatorProfiles(c, generics mapping, empty), 
return val); 


-- otherwise the component is a type so get the profiles 
-- for each of its operators 
else 
foreach((id: psdl_id; o: operator), 
operation _map_ pkg.scan, (operations(c)), 


addProfiles(getOperatorProfiles(o, generics mapping, 
psdl_id pkg.Upper To _Lower(c_id)), return val); 
) 
end if; 


-- TODO: need to break out of this loop so that only the 
a= first component is processed. 
) 


returneretirnnve le 
end getComponentProfile; 


=— Function: (splacop 


-- Description: helper function to split an operator with more 

== than one output into a sequence of operators 

-- where each operator has one of the outputs. 

-- When splitting, instances of the operator's generic 
ar types in the inputs and the outpus are converted to 
== their mapped types according to the generics mapping. 
a= Each split operator's profile is then calculated. 


function splitOp(op: operator; generics mapping: in GenericsMap; 
tyee sid: spsdal vid) 
return OpWithProfileSeq is 


return _val: OpWithProfileSeq; 
temp _owp: OpWithProfile; 

temp output name: psdl_ id; 

temp output _type: type name; 
HNUMeEr Te sigs: —s1gnaLuresequence; 


begin 
== EoumeachOoutpuT 
foreach({o_id: psdl_id; o_ tn: type_name), 
type Gee tlaratilen pro7scan) .(OuEpUtS (Op) )7 


--~ make a copy of op but with only the current output 
temp_owp.op := make_atomic_operator ( 

psdl_ name => name(op), 

ada _ name => ada _name(op), 

S(S0| [ORS => Cleim\eiole jesveshiisieaies ((sjo)) 

keywords => keywords (op), 

axioms => axioms(op), 

State => states(op)); 


-~ add the inputs 
foreach ((i_ id: psdl oid; a tn: type name), 
type declaration pkg.scan, (inputs(op)), 
if generics map _pkg.member(i_tn.name, generics mapping) then 
adGmInpuG (Vela create ( 
generics map_pkg.fetch(generics mapping, i_tn.name), 
psdl_id_ sequence pkg.empty, 
type declaration pkg.create(null type)), 
temp _owp.op); 
else 
add input (i id, igtn,) temp: owp- op); 
end if; 
) 


-- add the output 


if generics _map_pkg.member(o _tn.name, generics mapping) then 
add_output(o_id, create ( 
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Genser ice map pkg. (eten( generics mapping, o tn.name), 
psdl_id_ sequence pkg.empty, = 
type_declaration_pkg.create(null type)), 

temp owp.op); 

else 
add _output(o_id, o_tn, temp_owp.op); 
end if; 


-- Convert the new operator's signature to numeric signatures 
-- (see the comments in the specification of profile calc). 
-- Note the call to createNumericSignatures can now just pass 
-- an empty GenericsMap since the generics were mapped to actual 
-- types in the above code. 
numeric sigs := 
createNumericSignatures (temp_owp.op, 
generics _map _pkg.create(empty), type id); 


-~- compute the new operator's profile 
temp owp.op profile := software _base.getProfileID(computeProfile ( 
signature seq pkg.fetch(numeric sigs, 1))); 


-- add the new operator-with-profile to return val 
addOpWithProfile(temp_owp, return val); 
) 


return return val; 
end splitoOp; 


-- Function: getOpsWithProfiles 

-~- Description: constructs a sequence of OpWithProfiles (a PSDL operator 
aa and its corresponding profile) representing the operators 
a in the PSDL component specified in filename. 


function getOpsWithProfiles(filename: in string; 
generics mapping: in GenericsMap) return OpWithProfileSeq is 


the file: file type; 
the prog: psdl program; 
return val, foo: OpWithProfileSeq := owp_sequence pkg.empty; 


begin 
-— Parse the psdl file to create a psdl program 
open(the file, IN FILE, filename); 
assign(the prog, psdl program_pkg.empty psdl program); 
psdieilo,dee(thne tile, the prog); 
elese(thentimre) ; 


-- if the program contains more than one component 
-- then just get the first one since the program 
-- is only supposed to have one (a requirement of 
-- this implementation). Generic maps need a method 
-- that allows the user to fetch a single mapping 
—— in the map. 
POreaehi (cule wmpscleid, cs psdl component), 
psdl_ program map _pkg.scan, (the_prog), 


-- if the component is a single operator then just 
=- get that operator 
hE eeompenentscaccgory(G) = psdl operator then 
foreach((owp: OpWithProfile), owp_ sequence pkg.scan, 
(splitOp(c, generics mapping, empty)), 
addOpWithProfile(owp, return_val); 
) 


-- otherwise the component is a type so get 
-- each of its operators 
else 
foreach((id: psdl_id; o: operator), 
operation map pkg.scan, (operations(c)), 


foreach((owp: OpWithProfile), owp_sequence pkg.scan, 


(splitOp(o, generics mapping, 
psdl_id pkg.Upper To _Lower(c_id))), 
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addOpWithProfile(owp, return_val); 


-- in the above statement we 

-~ temporally pass the generic parameters for the whole 
-- type, c. Should really just pass the generic 

-- parameters for the operation, o, only. This will 

-~ happen when generics get reworked. 


end if; 
-- TODO: need to break out of this loop so that only the 
-- first component is processed. 

) 


return Ceturnmval, 
end getOpsWithProfiles; 


-- Function: getOpsWithProfiles 

-~ Description: constructs a set of OpWithProfiles (a PSDL operator 

aa and its corresponding profile) representing the operators 
== in the PSDL component specified in filename. 


function getOpswWithProfiles(filename: in string; 
generics mapping: in GenericsMap) return OpWithProfileSet is 


the file: file type; 
the prog: psdl_ program; 
return val: OpWithProfileSet; 


begin 
-- parse the psdl file to create a psdl program 
open(the file, IN_FILE, filename); 
assign(the prog, psdl program_pkg.empty_psdl program); 
psdlvio.get(Chemeile,sthe prog), 
close(the_file); 


-~ if the program contains more than one component 

-- then just get the first one since the program 

-- is only supposed to have one (a requirement of 

-- this implementation). Generic maps need a method 

-- that allows the user to fetch a single mapping 

-- in the map. 

foreach((c_id: psdl_id; c: psdl_component), é 
psdl_ program _ map pkg.scan, (the_prog), 


-- if the component is a single operator then just 
-- get that operator 
if component category(c) = psdl operator then 
foreach((owp: OpWithProfile), owp_Sequence_pkg.scan, 
(splitOp(c, generics_mapping, empty)), 
owp_set_pkg.add(owp, return_val); 
) 


-- otherwise the component is a type so get 
-~ each of its operators 
else 
foreach((id: psdl_id: o: operator), 
operation_map pkg.scan, (operations(c)), 


foreach ((owp: OpWithProfile), owp_sequence_pkg.scan, 
(splitOp(o, generics mapping, 
psdl_id pkg.Upper To_Lower(c_id))), 
owp_set_ pkg.add(owp, return val); 


-- in the above statement we 

-- temporally pass the generic parameters for the whole 
-~ type, c. Should really just pass the generic 

-- parameters for the operation, o, only. This will 

-- happen when generics get reworked. 
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eee Po need tO break Outset this loop so that only the 
-- first component is processed. 
) 


return return val; 
end getOpsWithProfiles; 


-—-serocedure. psal idPut 
Peccedure psd! 1dPut(the id; in psdlid) is 
begin 
put (convert (the_id)); 
end psdl_idPut; 


-- Function: getGenericsMap 

-- Description: generates all the possible mappings of generic types 

-- to actual types for all the generic parameters in 

-- the component specified in the PSDL file, filename. 

-- See description of GenericsMap in psdl profile.ads. 

-- This is done by collecting all the types used in the 

2S operatations of the component (note we are only processing 
=e type components, not operator components) into a set 

-- and then performing the cross-product of this set with 

-- the set of generic parameters. 


function getGenericsMaps(filename: in string) return GenericsMapSet is 


bhesrile: file type; 

the prog: psdl_ program; 
Beeutie de ocho reside oct, 
Jemeset.esale tayser, 

type set: psdl_id set; 

temp map: GenericsMap; 


procedure cross _ product(g_ set, t_set: psdl_id_set; gens map: GenericsMap) is 
temp set: psdl_ id set; 


Se psd herd: 
local map: GenericsMap; 
begin 


generics map _pkg.assign(local_map, gens _map); 
ii psdl id set _pkg.size(g set) > 0 then 
psdl_id_set_pkg.assign(temp_set, g_ set); 
de s=epcdlsiadeset pkg. choose(g set); 
foreach ( (the type wd: psd] id), psdl id set _pkg.scan, (t_set), 
generics_map pkg.bind(g, the_type id, local map); 
psdl_id_ set _pkg.remove(g, temp_set); 
Gross product (temp set, t_ set, local map); 
generics _map pkg.assign(local_map, gens _ map); 
) 
generics map pkg.recycle(temp_ map); 
else 
generics map _set_pkg.add(local map, return val); 
end ii; 
end=cross product; 


begin 
return_val := generics map set_pkg.empty; 


-- parse the psdl file to create a psdl_ program 
open(the file, IN_FILE, filename); 

assign(the prog, psdl_ program pkg.empty_ psdl_ program); 
psdl_ io.get(the file, the prog); 

close(the_ file); 


-- if the program contains more than one component 

-- then just get the first one since the program 

-- is only supposed to have one (a requirement of 

-- this implementation). Generic maps need a method 

-- that allows the user to fetch a single mapping 

-- in the map. 

EPOLeCACH( (Gera psdlid; ¢€: psdl component), psdl program map pkg-scan, 
(Chen PEO)”, 
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-- collect the names of the generic parameters 
foreach((the_id: psdl_id; the_tn: type name), 
type declaration _pkg.scan, (Generic parameters (ce), 
if eq(psdl_id_pkg.Upper To _Lower(the_tn.name), 
convert ("private _type"™)) then 
psdl_id_set_pkg.add(psdl_id _pkg.Upper_To Lower(the_id), 
gen_ set); 
end if; 
) 


-- collect the types used in all the operators 
if Component cCacegory(c) = psdl type then 
foreach((o_id: psdl_id; o: operator), 
operation map pkg.scan, (operations(c)), 


——einDouts 
foreach((the_id: psdl_id; the_tn: type name), 
type_declaration pkg.scan, (inputs(o)), 
psdlsid set pkg-add( 
psdl_id _pkg.Upper To Lower(the tn.name), type set); 
) 


== CUucpUtS 
foreach((the_id: psdl_id-; the_tn: type name), 
tyPeLcecelarck JOnepkG scan, (OUCDUCS(O)), 
psdl_id set_pkg.add( 
psd! id pkg. Upper sleshower (the tn-name), type set); 


) 
end if; 


-- TODO: need to break out of this loop so that only the 
-~ first component is processed. 
) 


generics map pkg.create(empty, temp_map); 
SLOSS  PrOdver (gen ysel, type set, temp map): 


return return val; 
end getGenericsMaps; 


end psdl profile; 
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run_batch.g 


ee ee we we we we ce we ew a we ee ee we we we we wee ee we we ww eee ee ee 


Bowe Gogramc run Datel 


=-- Description: 


collects statistics for measuring the effect different 
profile definitions have on profile filtering and 
Signature matching. 


Meenetext 10; use text 10; 


with a_strings; 


use a_ strings; 


With psdl concrete type pkg; use psdl concrete type pkg; 


Beene profile calc; use profile calc; 

With psdl profile; use psdl_ profile; 

With profile types; use profile types; 

with component_id_ types; use component_id types; 
with haase diagram; use haase_ diagram; 

with candidate types; use candidate_types; 

with software base; 

with sig match_types; use sig_match_types; 


with sig_match; 


use sig match; 


Pececedure run batch is 
the candidates: CandidateSet; 
sn, the_branch, another branch: SigMatchNode; 
Geops, CC Ops: OpWithProfileSeg; 


batch file: 


EPleeeE pe, 


imput line: strifig(1. .256); 
line length: natural; 


Gverles dir, 


results_dir: a string; 


query filename, sm_filename, p_hist_filename, sm_hist filename: a_string; 
temp candidate: Candidate; 


procedure printArrayProfiles(profile array: in ArrayProfiles; 
num profiles: in integer) is 
the profile: ArrayProfile; 
1: integer; 
rval: integer; 


begin 
Eo 1 in 1. .num profiles loop 
the profile := profile array (i); 
Eval -= printArrayprotile(the profile) ; 


new_line; 
end loop; 
end printArrayProfiles; 


begin 


put_line("Initializing Software Base..."); 
software base.initialize("sb header.txt"); 
seeuc line("finished.”); 


put (integer'image (software base.numComponents) ); 

put(" components in "); 

put (integer'image (software base.numOccupiedPartitions) ); 
pue dane({™ partitions."); 


new_line; 


put ("Generating GML..."); 
software base.generateGML("haase diagram.gml"); 
Puc tine (finished. "); 


new_line; 


epenibatch file, an file, “batch.txt”™); 
get_line(batch_file, input_line, line_length); 
queries dir := to a(input_line(l..line length)) & "/queries/"; 


results dir 


s= to_a(input_line(l..line_length)) & "/results/"; 


Peer ine (convert {text (queries dir) )); 
put_line(convert (text {results dir))); 
while (not end jot file({batch file)) loop 
get_line(batch file, input_line, line_length); 
new_line; 
put ("PROCESSING "); 
Puls line (input sine (i. line. length) ); 
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new line; 


query filename := queries dir & to_a(input_line(1..line_length)) & ".psdl"; 
p hist filename {= )tesults diresstorea nour line()) line length) & “-p-hist tae 
sm hist filename @— results dir tsatora(inpuc line(1..Jine length)) & “-sm- 

a Sit xt 
sm filename := results dir & to_a(input_line(1..line length)) & “-sSm-Stat.txt"; 
put ("Profile Fi lteming: 2...) 
the candidates: — esol ware base (prenieral ter ( 


convert (text (query filename) )); 
put_line("finished.~); 


put (integer'image(candidate set _pkg.size(the candidates) )); 
put line (™ candidates. )7 

candidateSet Put (the candidates) ; 

new_line; 

new_line; 


generateProfileHistogram(convert (text(p_hist_filename)), the_candidates) ; 


the candidates := profileSkim(1.0, the_candidates) ; 

put (integer'image(candidate set _pkg.size(the candidates) )); 
Puc line("Scandudatesceiave prolrtearank as) ) 7 
candidateSet Put (the_candidates) ; 

new line; 

new_line; 


put (“Signature Matehing .. 7, 
if candidate set_pkg.size(the candidates) > 0 then 
temp candidate := software_base.signatureMatch ( 


convert (text (query filename)), 
candidate set pkg.choose(the candidates) ); 
generateSigMatchHistogram(convert(text(sm_hist_filename)), 
temp candidate); 
SigMatenstatsPuE(eenvert (Text (Sm ti lename) )); 
SrnGeen i. 
end loop; 
close(batch file); 
end run_batch; 
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sig_match.ads 


mee mee mee ee cee ee ee ce er er es ee ee ce ee ce wee ee ce ces ce ce ee ce we wr we we we cr er ee ee we ee ee ee ee ee ee ee a es a ee ee 


With psd) profile; use psdl_ profile; 
with sig match_types; use sig match_types; 


package sig match is 


procedure match ops(query, candidate: in OpWithProfileSeq; 
root sn: in out SigMatchNode) ; 


procedure sigMatchStatsReset; 
procedure sigMatchStatsPut (filename: string); 


end sig_match; 
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cee ec ee we ee ee ee ee ee ee ea ew we eee ee ie a 


With text 16, 9use texto, 


With psdl concrete vt yperpkg, ) use psdly concrete type pkg; 
with psdl_component_pkg; use psdl_component_pkg; 


With proftilevtypes; use prorile types; 
With) psdl sprott le; suse pod ly prorile; 
With Sig Matchetypes; use Sig mateh types; 


package body sigumdatcheis 


failedvoutputs: natural 
passed outputs: natural 
failed basics: natural 

passed) basies: natural — 10 
duplicates: natural := 0; 
total inputs: natural := 0; 

fal led anpurs:snatural. = 70 


o “eo 


N 
= 
Ss Ce 


=~ BUnCELOn dete basics 

-- Description: removes any user-defined types from the inputs argument, 
rar thereby returning a type declaration with predefined 

== types only. 


function Getebastes (I mputs. sin st ype declaration) return type declaration is 
return val: type declaracion; 
begin 
tVpesdecelaracion pkg.aSssrqn(réeturnmeval, anputs) ; 
EOreden(( tne id: psdlyid, the tn: type name), type declaration pkq.scan, 
(inputs), 
Lf NOt wis prederined (Ehe tn) )then 
Eype, declaration pkg.remove Ene  1a7) return vali): 
end if; 
) 
PEtUrnerecUrnevdar, 
end get basics; 


=o SUNCEION = get user ncetimed 

-- Description: removes any predefined types from the inputs argument, 
== thereby returning a type declaration with user-defined 
a types only. 


ruNnecEiOn get user defined (inputs: im type declaration} 
FeLUrn typendeclaration 15 
return val: type declaration, 
begin 
type déclaration) pkg. assign (returneval,, inpurcs); 
foredeh((Enewid-—psdleid; tiie tn: mtyperndmne)) stypendeclararion pkg.ceanm, 
(inputs), 
TE wi seprede fined (enecern) | then 
Eype declaration pkg. remove (the id) return val), 
end if; 
) 
FELuUroereeLurn vail 
end get User cefined, 


== FUNCETON: = mdtenebdsies 

-- Description: determines if the query's basic input types can match the 
oe candidate's basic input types given the following rule: 
=S Basic types: either they must match exactly or the 

oon query's input type must be a subtype of the component's 
== input type. 
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function match_basics(q_ basics, c_basics: in type declaration) 
return boolean is 7 
piewqd basites.) Eype declaration, 
Pieweebastes.. weype Gcelaralion; 
Neweqdsbasies; type declaration, 
new ¢ basics: type declaration; 
found_match, found_c2, return_val: boolean; 
begin 
type declaration _pkg.assign(new_q basics, q basics); 
type declaration pkg.assign(new_c_basics, c_basics); 


-- cannot match if query has different number of basics then 
-- the candidate 


if type declaration_pkg.size(q basics) /= 
type declaration_pkg.size(c_basics) then 
return false; 
end if; 


-- filter out the basics that match exactly 
type declaration _pkg.assign(the _c_ basics, new_c_ basics); 
foreach((q_id: psdl_id; q_tn: type name), type declaration pkg.scan, 
(q basics), zs 
found_match := false; 
foreach((c_id: psdl_id; c_tn: type name), type declaration pkg.scan, 
(new _c_ basics), - 
if nNOe found mate Chen 
tH eegqual(quen, “c tn) Chen 
type declaration pkg.remove(q id, new_q basics); 
type declaratvonspkg.remove(c id, the c basics); 
found match <= true; 
end if; 
end if; 
-- TODO: would rather break out of the inner for loop when a 
—— Match is found rather than do this found match Sturt. 
) 
type declaration pkg.assign(new_c basics, the c basics); 


-- Filter out the remaining basics that can match to supertypes. 

-- This is done by temporally mapping each query input type to a 

-- supertype in the candidate that is closest in the partial ordering 
-- of basic types. 


bype. declaration pkg.assign(the q basics, new_q basics); 
Peoreacn((q 1d- psdl id; gq tn: type name), type declaration pkg.scan, 
(Eber basies), 
found match := false; 
e ype declarattzon pkg-assign(the ¢c basics, new_c basics) ; 
EOreach({(c_id: psdl_ id? c tn: type name), type declaration_pkg.scan, 
(Cheeses basics), 
if net found mateh then 
PiesubeypecrOn Geen, Cltn)) then 
found se2)-— false; 
EOrGacm(tezn1d: psdlvid,; c2 tn: “cypemname), 
type declaration _pkg.scan, (the _c_basics), 
ReenOceOund ¢€2 Chen 
Peenoe Equal (G tn, c2 tn) then 
ae SUbe ype! OF (GQ tn, 62, en) sand 
SUubtypesot (62 th, Cc etn) sehen 
found ¢2 -= erue, 
Sid wk, 
ergwii, 
end eac 

) 

If Not found ¢c2 then 
type_declaration_pkg.remove(q_id, new_q basics); 
type declaration_pkg.remove(c_id, new_c_basics); 
found match := true; 

end if; 

end if; 
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end if; 


od 


-- if there are any basics left over than match is not possible since 
-- basics cannot be matched to non-basics 


return_val := type _declaration_pkg.size(new_q basics) = 0; 


-- recycle local variables 

type declaration _pkg.recycle(new_q basics); 
type declaration_pkg.recycle(new_c_basics); 
type declaracion pkg. recycle (the gq basics); 
type declaraereOnmpkg. recycle(the c basics), 


return return_val; 
end match_basics; 


— 


== Procedure: matecneoueputs 

-- Description: This function serves two purposes: 1. to determine if 
-- the outputs of the matched operations can match, and 
== 2. if they can match, add the type mappings to sn.V.TM. 


procedure match_outputs(sn: in out SigMatchNode; success: out boolean) is 
G Output CYpe, sesouepuLE type.  cype name; 
begin 
success := true; 
foréach((q op; Operator, C Op: Operator), Op map pkgq.scan, (sn.V.©M), 
if success then 
=— GeGE q oOp"s one-and-only output type 
ClCuUcpUEeEYy pon. —seypendec! aratlonupra. resmsec pkg. choose 
type _declaration_pkg.map_range(outputs(q_op))); 
==) 9¢e) Ceop 5S) ONe-anG-onlyeeueput —-ype 
c_output_type := type declaration _pkg.res_set_pkg.choose ( 
type declaration_pkg.map_range(outputs(c_op))); 


if is_predefined(q_output_type) or 
is predefined(c_output_type) then 
if NOU SubeEyperoLr(G OULPULIEYype, sc eeucpucet ype) then 
success := false; 
end if; 
elsif Cypermapoprg- member (qgouepurcEcype,=sn-V¥.IM) then 
Tpenet equal(C OUEDUL Eype, 
type _ map pkg.fetch(sn.V.TM, q output _type)) then 


success := false; 
end if; 
else 
type_map_ pkg.bind(q output _ type, c_output_type, sn.V.TM); 
end if; 
end if; 


) 
end match_outputs; 


-- Procedure: match _inputs 


== Deseription: 


—— 


procedure match_inputs(root_sn: in out SigMatchNode; success: out boolean) is 


procedure match(q_ inputs, c inputs: in type declaration; 
root_sn: in out SigMatchNode; success: out boolean) is 
new_gq inputs, new_c_inputs: type declaration; 
temp_q_inputs, temp_c_inputs: type declaration; 
ci: type name; 
temp _ sn: SigMatchNodePtr; 
temp id: psdimic, 
found _temp_id: boolean; 
got_first qi: eoelean; 
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return_val: SigMatchNode; 

begin 
return_val := createSigMatchNode; 
sigMatchNodeAssign(return_val, root_sn); 


type declaration _pkg.assign(new_q inputs, q inputs); 
type declaration pkg.assign(new c inputs, c inputs); 
success := true; 7 is 
foreach((q_id: psdl_id; qi: type name), 
Byepe declaration pkasscan, (q inputs); 
if success then 
if type_map_pkg.member(qi, root_sn.V.TM) then 
Gi 3— type map pkg.fetcn (root sn.V.1TM, qi); 
-- if the current query input type is already mapped 
-- then make sure it is mapped to an existing type in 
~- the candidate’s inputs. Note to test this we must 
-- look at the type declaration's range (the types) 
=- not its domain (the psdl ids) . 
ak not type _declaration_pkg.res_set_pkg.member (ci, 
type _declaration_pkg.map_ range(c _inputs)) then 
success := false; 
else 
-- remove qi from new _q inputs 
type_declaration_ pkg. remove(q_id, new_q inputs); 
-- remove ci from new_c_inputs — 
found_temp_id := false; 
if not found_temp_id then 
foreach((c_id: psdl_id; c_tn: type _name), 
type _declaration_pkg.scan, (new_c inputs), 
if equal(ci, c tn) then 


temp _id := c_id; 
found temp _id ;= true; 
==) TODO: would rather break-out or for, loop. 
end if; 
) 
end if; 


if found _temp_id then 
type declaration _pkg.remove(temp id, new_c inputs); 
else 
-- if this else block gets called 
-- there is something wrong 
put_line("there is something wrong"); 
success ;:= false; 
end if; 
end sia; 
end if; 
end “4; 
) 
if success then 
pee Oestdhotegqieised Cneesy way Ol only gelling tne (iirse 
-- element out of the map. Maps need a way of fetching by 
-- i'th element. 
GG ies eagle false, 
BOrCAC IM auqeera:  PSdl 1d; sq st ype name), 
type declaration _pkg.scan, (q_inputs), 
fe Not Got first gi then 
GOtwer best sqiys=  LEUC; 
foreacm((er1d: psdivid; Cc tn: type name), 
type declaration pkg.scan, (c_inputs), 
temp _sn := new SigMatchNode’ (createSigMatchNode) ; 
sigMatchNodeAssign(temp sn.all, root_sn); 
temp _sn.expanded_for_inputs := false; 
type map pkg.bind(qi, c_tn, temp_sn.V.TM); 
type declaration pkg.assign(temp_q_ inputs, 
New sq 1 npucs) 7 
type declaration_pkg.assign(temp_c_inputs, 
new_c_ inputs); 
type _declaration_pkg.remove(q_id, temp_q inputs); 
type declaration_pkg.remove(c_id, temp_c_inputs); 
match(temp_q inputs, temp_c_inputs, temp_sn.all, 
success); 
if success then 
addBranch(temp_sn, return val); 
end if; 


end if; 


ies 


) 
end if; 
sigMatchNodeAssign(root_sn, return_val); 
end match; 


GQ inputs, <¢ Inputs. type declaracicn; 


begin 
success := true; 
foreach((q_ Op: Operator, ‘€ op: operator), cp map pkg.scan, (Froot 7sn.V.OM), 
if success then 


-- Remove the input types that have already been mapped. 
type declaration _pkg.assign(q inputs, inputs(q_op)); 
type declaration pkg.assign(c_inputs, inputs (c_op)); 


-- query 
foreach((the id= psdigid; the tn: type name) , 
type_declaration_pkg.scan, (inputs(q_op)), 
if type_map pkg.key set_pkg.member (the tn, 
type_map_ pkg.map_domain(root_sn.V.TM)) then 
-- If the type was mapped make sure it was mapped to 
-- a type in the candidate operator. This is necessary 
-~ because inputs are mapped for one operator at a time. 
if type declaration _pkg.res set _pkg.member ( 
Eype map pkg.fetch(root sn.V.TM, the ttn), 


ibype declaration pkg -map range (ec sinputs)) then 
type declaration pkg.remove(the id, q inputs); 
else 
success := false; 
end if; 
end if; 
) 
-- candidate 


foreach((the_id: psdl_id; the_tn: type name), 
type declaration pd -Sscaimas inputs (c Gp), 
if type map _pkg.res_set_pkg.member(the tn, 
type _map pkg.map range(root_sn.V.TM)) then 
type declaration _pkg.remove(the id, c_inputs); 
end if; 


-- if the number of remaining inputs types for the query and 
-- the candidate are not equal then the operations cannot match 


if success then 
if type _declaration_pkg.size(q_inputs) /= 
type declaration_pkg.size(c_ inputs) then 

Success := false; 

else 
-- if the node has already been expanded for inputs then 
-- all of its operators' inputs must already be mapped 
-- otherwise the node fails. 
if root_sn.expanded for_inputs then 


success := type declaration pkg.size(q inputs) = 0; 
else 
Match (Geteuser detined(qeinpues):, 
get_user defined(c_inputs), EOOU ssn, SUCCESS); 
end if; 
end if; 
end if; 


end 1£; 
) 


end match inputs; 


-~- Function: verify subtypes 


=—) DeSCTIpLlon: 


function verify subtypes (root_sn: in SigMatchNode) return boolean is 
begin 


114 


= LODO 
retueme erue; 
Slam Ctl lyesioey pes; 


ea erocedure: Match ops 

meebecerlpE1On: this 1 the malin procedure for signature matching. 

SG Given the operations and their profiles for a query and a 
-— candidate, this method will return a SigMatchNode whose 
~- branches contain valid operation and type mappings. 


procedure match_ops(query, candidate: in OpWithProfileSeq; 
root _sn: in out SigMatchNode) is 
Return val. SighatchNode; 
Pop esol: ofoMatehiNodertr; 
success, pruned: boolean; 
temp_query, temp_candidate: OpWithProfileSeq; 
temp char: character; 
begin 
return val := createSigMatchNode; 
SigMatchNodeAssign(return_val, root_sn); 


owp sequence pkg.assign(temp query, query); 
owp_ sequence pkg.assign(temp candidate, candidate) ; 
foreach((q_owp: OpWithProfile), owp_sequence pkg.scan, (query), 
foreach((c_owp: OpWithProfile), owp_sequence_pkg.scan, (candidate), 
if q Owp-op profile = c owp.op profile then 
temp _sn := new SigMatchNode' (createSigMatchNode) ; 
SigMatchNodeAssign(temp_sn.all, root_sn); 
op_map_pkg.bind(q_owp.op, c_owp.op, temp _sn.V.OM); 
if not validPairingExists(temp_sn.V.OM, return_val) then 
MoaLeheOuLputs(temp Sn-all, success): 
if success then 
PasseO sOUEPUES ):— PaSseq outputs + 1; 
PieMaecheoasives (get basics (inputs (q.owp.op) 17 
get basics(inputs(c_owp.op))) then 
opWithProfileSeqRemove (q_owp, temp query); 
opWithProfileSeqRemove (c_owp, temp candidate) ; 
Match ops (temp query, —temp candidate, temp sn ali 7, 
addBranch(temp_sn, return_val); 
Paesed. basics = :=  paSscdseaolecmiagl: 
else 
failed basics := failed basics + 1; 
end if; 
else : 
failed outputs := failed outputs + 1; 
end if; 
else 
duplicates := duplicates + 1; 
lice i fy, 
end if; 


~~ prune leaf nodes until all leaves are valid solutions 
pruned := true; 
while pruned loop 
pruned := false; 
sigMatchNodeAssign(root_sn, return_val); 
foreach((leaf_snp: SigMatchNodePtr), sig_match_node_ ptr_seq_pkg.scan, 
(getLeafNodePtrs(root_sn)), 
if leaf snp.validation = UNKNOWN then 
match inputs(leat snp-all, success); 
EOtaretnputs := total inputs + 17 
if not success then 
leaf snp.validation := INVALID; 
elsif not verify subtypes(leaf_snp.all) then 
leaf snp.validation := INVALID; 
else 
if sig match_node_ptr_seq_pkg.length ( 
leaf snp.branches) = 0 then 
leaf snp.validation := VALID; 


Tee) 


else 
leaf _snp.expanded for inputs := true; 
end it; 
end if; 
if leaf snp.validation = INVALID then 
-- removeBranch(leaf snp, return val); 
removeAll liMartchingBreancnes (leat isnp,) ceturm val); 


failed inputs := failed inputs + 1; 
pruned := true; 
end if; 


end if; 


) 


end loop; 


-- recycle local variables 


Owp Sequence tpkg. secycle| temp query), 
owp_sequence pkg.recycle(temp_ candidate) ; 


sigMatchNodeAssign(root_sn, return_val); 
end match ops; 


procedure sigMatchStatsReset is 


begin 
failed outputs ;=90; 
Passeaq oOucputs .. a0, 
failed basics := 0; 
passed basics := 0; 
duplicates := 0; 
total_ inputs := 0; 


failed inputs := 0; 
end sigMatchStatsReset; 


procedure sigMatchStatsPut(filename: string) is 
the_file: file type; 

begin 
create(the file, out_file, filename); 
But (the tile, Uupiieates a. )- 
put_line(the file, integer'image (duplicates) ); 
put (the file, "Passed Output Matching: "); 
put_line(the file, integer'image(passed_outputs) ); 
PUC (the file, Parled Output Matewing: sr 
Buc lInet@themtile, integer amage tar ledvoursurs))); 
put(the file, "Passed Predefined Type Matching: "); 
put_line(the file, integer'image (passed _ basics) ); 
PUL(Ehe tile,  “Favled sPredetinedstype Matching), 
put_line(the file, integer'image(failed_basics) ); 
PUL (the tile, “Tecaleinputs: "); 
PUG line(thertite. inteqer tmage(tcetalyinpucs ie 
put(the file, “Failed Inputs: "); 
put_line(the file, integer'image(failed_inputs) ); 
Cclose(the_file); 

end sigMatchStatsPut; 


end sig _ match; 
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sig match types.ads 


TN I 


SE ae ee ee ee es a ee eer ceaee coment vot, es em cs cs | es Oe ye ee es se es 


with psdl_concrete_type_pkg; use psdl_concrete_type_ pkg; 
with psdl_component_pkg; use psdl_component_pkg; 


with generic_map_pkg; 
with generic sequence _ pkg; 
with generic _set_pkg; 
with ordered _set_pkg; 


package sig match _types is 


_— =—_ 


-~ Types 


-- TypeMap 


package type_map_ pkg is new generic_map_pkg( 
key => type name, 
Fesult —> type name, 
eq key => equal, 
eq res => equal, 
average Size => 4); 
subtype TypeMap is type_map_pkg.map; 


procedure typeNamePut (the tn: type _name); 


procedure typeMapPut is new type map_pkg.generic put ( 
key put => typeNamePut, res put => typeNamePut) ; 


procedure typeMapFilePut is new type map _pkg.generic file put ( 
key put => typeNamePut, res _ put => typeNamePut); 


-~ OpMap 
package op_map_ pkg is new generic_map_pkg( 
key => operator, 
result => operator, 
eq key => eq, 
eq res => eq, 
average size => 4); 
subtype OpMap is op _map_pkg.map; 


procedure opPut(the op: operator); 


procedure opMapPut is new op map _pkg.generic put ( 
Key put => opPut, res put => opPut); 


Procedure opMapFilePut is new op map _pkg.generic file put( 
key put => opPut, res put => opPut); 


-- SignatureMap 


type SignatureMap is record 
TM: TypeMap; 
OM: OpMap; 

end record; 


function createSignatureMap return SignatureMap; 


procedure addTypeMapping(tnl: in type name; tn2: in type name; 
sm: in out SignatureMap) ; 


procedure addOpMapping(opl: in operator; op2: in operator; 
sm: in out SignatureMap); 
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function signatureMapEqual(sml: in SignatureMap; sm2: in SignatureMap) 
return boolean; 


procedure signatureMapPut(sm: in SignatureMap); 


——— 


-- SignatureMapSet 

Package Sig MapySet pPko 1S New generic Set pkg( 
t => SignatureMap, eq => signatureMapEqual); 

subtype SignatureMapSet is sig_map_set_pkg.set; 


procedure signatureMapSetPut is 
new sig map _set_pkg.generic put(put => signatureMapPut); 


-- SigMatchNodePtr 
type SigMatchNode; 
type SigMatchNodePtr is access SigMatchNode; 


function sigMatchNodePtrEqual(smnpl: in SigMatchNodePtr; 
smnp2: in SigMatchNodePtr) return boolean; 


function sigMatchNodePtrLessThan(smnpl: in SigMatchNodePtr; 
smnp2: in SigMatchNodePtr) return boolean; 


procedure sigMatchNodePtrPut(smnp: in SigMatchNodePtr); 


-- SigMatchNodePtrSeq 
Package ssid mabtehencde prEr seq pkg 1s mew generic sequence pkg( 
t => SigMatchNodePtr, average size => 4); 
subtype SigMatchNodePtrSeq is sig _ match node ptr_seq pkg.sequence; 


function sigMatchNodePtrSeqEqual is 
new sig match_node ptr_ seq pkg.generic equal(eq => sigMatchNodePtrEqual); 


function sigMatchNodePtrSeqMember is 
new sig match_node ptr _ seq pkg.generic member(eq => sigMatchNodePtrEqual); 


procedure sigMatchNodePtrSeqRemove is 

new sig _match_node ptr_seq pkg.generic_remove(eq => sigMatchNodePtrEqual) ; 
procedure sigMatchNodePtrSeqPut is 

new sig match_node ptr seq pkg.generic put(put => sigMatchNodePtrPut) ; 


— 


-—- SigMatchNodePtrSet 
Package Sig Mateamnode ptr sect pkg 1s new ordered sebupkg ( 
t => SigMatchNodePtr, eq => sigMatchNodePtrEqual, 
"<" => sigMatchNodePtrLessThan) ; 
subtype SigMatchNodePtrSet is sig match node ptr_set_pkg.set; 


procedure sigMatchNodePtrSetPut is 
new sig match node ptr set pkg.generic put (put => sigMatchNodePtrPut) ; 


procedure sigMatchNodePtrSetPrint (the set: sigMatchNodePtrSet); 


-- SigMatchNode 
type ValidationType is (UNKNOWN, VALID, INVALID); 
type SigMatchNode is record 
abo ey Saisie ibe WEA 
Signature rank: float; 
semantic rank: float; 
V: SignatureMap; 
validation: ValidationType; 
expanded for inputs: boolean; 
branches: SigMatchNodePtrSeq; 
end record; 


118 


function createSigMatchNode return SigMatchNode; 


procedure addBranch(the branch: in SigMatchNodePtr; 
the node: in out SigMatchNode); 


procedure removeBranch(the_ branch: in SigMatchNodePtr; 
the node: in out SigMatchNode) ; 


procedure removeAl1]lMatchingBranches (the branch: in SigMatchNodePtr; 
the node: in out SigMatchNode); 


function sigMatchNodeEqual(smnl: in SigMatchNode; smn2: in SigMatchNode) 
return boolean; 


function sigMatchNodeLessThan(smnl: in SigMatchNode; smn2: in SigMatchNode) 
return boolean; 


procedure sigMatchNodeAssign(smnl: in out SigMatchNode; 
smn2: in SigMatchNode); 


procedure sigMatchNodePut (the node: in SigMatchNode) ; 

procedure sigMatchNodePrint (the node: SigMatchNode) ; 

procedure generateGML(the_ node: in SigMatchNode; filename: in string); 
function getLeafNodePtrs (the node: in SigMatchNode) return SigMatchNodePtrSeq; 
function getLeafNodePtrs(the node: in SigMatchNode) return SigMatchNodePtrSet; 


function getValidLeafNodePtrs(the node: in SigMatchNode) 
return SigMatchNodePtrSet; 


function validPairingExists (pairing: in OpMap; the node: in SigMatchNode) 
return boolean; 


end Sig match types; 


le 


ee ee ee we ee es es i ee ee ec we wee ee ec cc ec ce ce ce ec cr ww wc ce ce ee ec we ew we ee wwe we ee 


a  — 


with text io; use text_io; 
with ada.float text 10; 


with psdllconcrete type pkg, use psdl concrete type pkg; 
with psdl_component_pkg; use psdl_component_pkg; 


with candidate types; 


package body sig match_types is 


~- Procedure: typeNamePut 


=e DCSCrIpt Von mOUtpDUtTS The type namewsesname 
procedure typeNamePut (the tn: type_name) is 
begin 
Peenor equal(tiler tues nulli type) then 
put (convert (the Cn name); 
end if; 
end typeNamePut; 


-- Procedure: opPut 


-- Description: outputs the operator's name 
Procedure epeulc(tienop: Operator) is 
begin 
af the op /= null component then 
put (convert (name(the op))); 
end if; 
end opPut; 


-- Function: createSignatureMap 


-- Description: create and initialize a SignatureMap for use. 
function createSignatureMap return SignatureMap is 
return val: SignatureMap; 


begin 
return val. 1M := "type map pkg. create (nul) type); 
return _val.OM := op map pkg.create (null component); 


BEEUYNereEUr val; 
end createSignatureMap; 


-- Procedure: addTypeMapping 
-- Description: binds two types together and adds them to the 
ao SignatureMap's TypeMap. 
procedure addTypeMapping(tnl: in type name; tn2: in type name; 
sm: in out SignatureMap) is 
begin 
EVpe Map Pkg sold tml, tn2,.esm.1M)7, 
end addTypeMapping; 


-- Procedure: addOpMapping 


-- Description: binds two operators together and adds them to the 
aa SignatureMap's OpMap. 
procedure addOpMapping(opl: in operator; op2: in operator; 
sm: in out SignatureMap) is 
begin 
Op Map pkg.bind(opl, op2, sm.OM); 
end addOpMapping; 
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-- Function: signatureMapEqual 
function signatureMapEqual(sml: in SignatureMap; sm2: in SignatureMap) 
return boolean is 
begin 
return type map pkg.equal(sml.TM, sm2.TM) and 
op_map pkg.equal(sml.OM, sm2.0M); 
end signatureMapEqual; 


-- Function: signatureMapPut 


procedure signatureMapPut(sm: in SignatureMap) is 


begin 
putt "OM: ); 
opMapPut (sm. OM); 
mact: ol IM; “); 


typeMapPut (sm.TM); 
end signatureMapPut; 


-- Function: sigMatchNodePtrEqual 
function sigMatchNodePtrEqual(smnpl: in SigMatchNodePtr; 
smnp2: in SigMatchNodePtr) return boolean is 
begin 
return sigMatchNodeEqual(smnpl.all, smnp2.all); 
end sigMatchNodePtrEqual; 


-- Function: sigMatchNodePtrLessThan 
function sigMatchNodePtrLessThan(smnpl: in SigMatchNodePtr; 
smnp2: in SigMatchNodePtr) return boolean is 
begin 
return sigMatchNodeLessThan(smnpl.all, smnp2.all); 
end sigMatchNodePtrLessThan; 


-- Procedure: sigMatchNodePtrPut 
procedure sigMatchNodePtrPut(smnp: in SigMatchNodePtr) is 
begin 
SigMatchNodePut (smnp.all); 
end sigMatchNodePtrPut; 


—<— 


-- Function: sigMatchNodeEqual 
function sigMatchNodeEqual(smnl: in SigMatchNode; smn2: in SigMatchNode) 
return boolean is 
begin 
if smnl.signature rank /= smn2.signature_rank then 
return false; 
end if; 


if smni.semantic_rank /= smn2.semantic_rank then 
Becurn false; 
end if; 


if smnl.validation /= smn2.validation then 
return false; 
end if; 


if smnl.expanded_ for inputs /= smn2.expanded for_inputs then 
fecurn false; 
end if; 


if not signatureMapEqual(smn1.V, smn2.V) then 
return false; 
end if; 


return sigMatchNodePtrSeqEqual (smnl.branches, smn2.branches); 
end sigMatchNodeEqual; 
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~~ Function: sigMatchNodeLessThan 
function sigMatchNodeLessThan(smnil: in SigMatchNode; 
smn2: in SigMatchNode) return boolean is 
begin 
if smnl.signature_rank > smn2.signature_rank then 
Felturn. CEue-; 
-- the following test for less~-than is just being paranoid 
-~ about potential float equality problems 
elsif smnl.signature_rank < smn2.signature_rank then 
return false; 
elsif smnl.semantic_ rank > smn2.semantic_ rank then 
return true; 
~~ the following test for less-than is just being paranoid 
-~- about potential float equality problems 
elsif smnl.semantic_rank < smn2.semantic_rank then 
return false; 
else 
return smnl.id < smn2.id; 
end if; 
end sigMatchNodeLessThan; 


~~ Procedure: sigMatchNodeAssign 

procedure sigMatchNodeAssign(smnl: in out SigMatchNode; 
smn2: in SigMatchNode) is 

begin 
smnl.signature_rank := smn2.signature_rank; 
smnl.semantic rank := smn2.semantic_rank; 
smnl.validation := smn2.validation; 
smnl.expanded_ for_inputs := smn2Z.expanded _ for_inputs; 
type _map_pkg.assign(smn1l.V.TM, smn2.V.TM); 
Op imap pkg. assign (smnliv.0OM, smnZz.V 50M); 
~- TODO: might have to do the deep copy myself here 
-— rather than call assign 
sig_match_node_ ptr_seq pkg.assign(smnl.branches, smn2.branches) ; 

end sigMatchNodeAssign; 


~~ Procedure: sigMatchNodePut 
procedure sigMatchNodePut (the node: in SigMatchNode) is 
begin 
put("(Signature Rank: "); 
if the_node.signature_rank = candidate types.RANK_UNKNOWN then 
put ("unknown") ; 


else 
ada. float text io.put(the node.signature rank, 1, 2, 0); 
end if; 
pues | Myre 
put ("(Semantic Rank: "); 


if the_node.semantic_rank = candidate _types.RANK UNKNOWN then 
DUCK unknown: pr 


else 

ada tlodgtabext 10. put (Cheencde. semanttepran,. lee2,— 0); 
end if; 
put(” | ee 


case the node.validation is 
when UNKNOWN => put ("Validation Unknown") ; 
when VALID => put("Valid"); 
when INVALID => put ("Invalid"); 
end case; 
Butea: | mee 
if the_node.expanded_ for_inputs then 
put ("Expanded") ; 


else 

put ("Not Expanded") ; 
end if; 
PUL ys 


Put ("OpeMap: =o) 
opMapPut (the node.V.OM) ; 
Pull aie); 

put("Type Map: “); 
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CypelapPueithe node.V.iM); 


pucG: | ee 

put (”{Branches: =”); 

sigMatchNodePtrSeqPut (the_node.branches) ; 
puts he) ; 

Buc’) ee 

new line; 


end sigMatchNodePut; 


~- Procedure: sigMatchNodePrint 
procedure sigMatchNodePrint (the node: SigMatchNode) is 
begin 
put ("Signature Rank: "); 
if the node.signature rank = candidate types.RANK UNKNOWN then 
put ("unknown") ; 
else 
ada.float_text_io.put(the node.signature rank, 1, 2, 0); 
end if; 
new line; 
put ("Semantic Rank: "); 
if the_node.semantic_rank = candidate_types.RANK UNKNOWN then 
put ("unknown") ; 
else 
adaseloat  cextetO.puc ithe node.semantic rank, 1, 2, 0); 
end if; 
new line; 
case the node.validation is 
when UNKNOWN => put("Validation Unknown") ; 
when VALID => put("Valid"); 
when INVALID => put("Invalid"); 
end case; 
Bue, an) ; 
Pewoic node.expanded for inputs then 
put_line ("Expanded") ; 


else 

put_line("Not Expanded") ; 
end if; 
BuE("Op Map: "); 


opMapPut (the node.V.OM); 
new_line; 
put("Type Map: "); 
LypeMapPut (the node.V.TM); 
new_line; 
put ("Branches: ™); 
SigMatchNodePtrSeqPut (the _node.branches) ; 
new_line; 

end sigMatchNodePrint; 


-- Function: createSigMatchNode 

~- Description: create and initialize a SigMatchNode for use. 

== Note, a unique node id is maintained to facilitate 
=a sorting when two nodes have equal signature and 

aan semantic ranks. 


unique _ node id: natural := 0; 
function createSigMatchNode return SigMatchNode is 
return val: SigMatchNode; 
begin 
return val.id ;= unique node id; 
unique node_id := unique node id + 1; 
return val.signature rank := candidate types.RANK UNKNOWN; 
return_val.semantic_rank := candidate_types.RANK_UNKNOWN; 
return_val.validation := UNKNOWN; 
return _val.expanded for inputs := false; 
return_val.V := createSignatureMap; 
EebUun val Oranecnes = :— Sig match node ptr seq pkg. empty, 
return return val; 
end createSigMatchNode; 


_—— 


-- Function: addBranch 
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~~ Description: add a branch (a child SigMatchNode) to the SigMatchNode. 
-- A branch represents a superset of the node it belongs to. 
-- What this really means is the branch node contains all the 
-- type and operator mappings plus of the node it belongs to 
-- plus more. 


procedure addBranch(the branch: in SigMatchNodePtr; 
the node: in out SigMatchNode) is 
begin 
Sig_match_node_ptr_seq_ pkg.add(the branch, the_node.branches) ; 
end addBranch; 


-- Function: removeBranch 


=-—- DeSCHIPELOnN. 
procedure removeBranch(the_ branch: in SigMatchNodePtr; 
the_node: in out SigMatchNode) is 
begin 
SigtatenNoderersceqkemove (the branch, the node. branches); 
end removeBranch; 


-- Function: removeAllMatchingBranches 


-- Description: 
procedure removeAllMatchingBranches(the_branch: in SigMatchNodePtr; 
the node: in out SigMatchNode) is 
begin 
SigMatchNodePtrSeqRemove (the branch, the _node.branches) ; 
foreach( (branch: SigMatchNodePtr), sig _match_node ptr_ seq pkg.scan, 
(the_node.branches), 
removeAl1lMatchingBranches(the branch, branch.all); 
) 
end removeAllMatchingBranches; 


—_ 


-- Procedure: generateGML 
-- Description: generate a GML file to graphically represent the 
-< SigMatchNode's relationship with its branches. 
procedure generateGML(the_ node: in SigMatchNode; filename: string) is 
id: natural := 0; -- unigue ID counter 
Eheti@seniatund! 7; )-— place nolder for call tolpuesnode om 
omig five. trlestype; 


function new_id return natural is 
begin 

id := id + 1; 

Feturn id; 
end new_id; 


procedure put_node_gml(sn: in SigMatchNode; my_id: out natural) is 
chatar rd. acura. 
begin 
my id := new_id; 
put(gml file, “node [ id "); 
put(gml_ file, integer'image(my_id)); 
puc(qml file, “label '): 
OpMaprilePutiom! Faley ssn. ¥.OM); 
put line (om tile, =7\"); 
typeMapFilePut(gml file, sn.V.TM); 
puterine(gml file,.  \ 3: 
case sn.validation is 
when UNKNOWN => put(gml file, "Validation Unknown") ; 
when VALID => put(gml file, "Valid"); 
when INVALID => put(gml_file, "“Invalid"); 
end case; 
PUES ine (omig filo ye va), 
if sn.expanded for inputs then 
put(gml_file, "“Expanded”); 
else 
put(gml file, "Not Expanded"); 
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endisie, 
pucwlinecgml file, °°" )"); 


=- recursively call put_node gml for each of its branches 
foreach( (branch: SigMatchNodePtr), sig match_node ptr_seq pkg.scan, 
(sn.branches), 
purl node soma ranen all, -childyid) ; 


-- make the edge to the branch 
Put om grate, edge yi id"); 
put(gml_file, integer'image(new_id)); 
putL(oml file, “ source "); 
buL(oml tile; integer image (my id)); 
BuE(gml rile tanger ”) ; 
put(gml file, integer*image(child id)); 
Pie ine (qmiei rey al) 7 
) 
end put_node_gml; 


begin 
eueare (omy file, sour file, filename); 
PurCgmilofiie, “graph {| 1d “); 
pucioml file, anteger ‘image (new id)); 
Pup line (aml file  sdarected i"); 
Ppuernodevaml (the nede, the 1d); 
Puc line(gml file, “)]"}; 
efese(gml file); 

end generateGML; 


-- Function: getLeafNodePtrs 
-— Description: collect the leaf nodes of the node into a sequence. 
function getLeafNodePtrs(the_ node: in SigMatchNode) 

return SigMatchNodePtrSeq is 


return val: SigMatchNodePtrSeq; 


procedure processNode(smnp: in SigMatchNodePtr) is 


begin 
if sig_match_node_ ptr_seq pkg.length(smnp.branches) = 0 then 
sig match node ptr seq pkg.add(smnp, return_val); 
return; 
end if; 


foreach((branch: SigMatchNedePtr), sig_match node ptr seq pkg.scan,; 
(smnp.branches), 
processNode (branch) ; 
) 


end processNode; 


begin 
Beturn val :—= Sig Match node ptr seq pkg.empty; 
foreach( (branch: SigMatchNodePtr), sig_match_node ptr _ seq pkg.scan, 
(the nodes branches) , 
processNode (branch) ; 
) 
Beturn return val; 
end getLeafNodePtrs; 


-- Function: getLeafNodePtrs 
=~ Description: collect the leaf nedes of the node into a set. 
a Note the set will keep duplicates out. 
function getLeafNodePtrs(the node: in SigMatchNode) 
return SigMatchNodePtrSet is 
return val: SigMatchNodePtrset; 


procedure processNode(smnp: in SigMatchNodePtr) is 


begin 
if sig_match_node ptr_seq pkg.length(smnp.branches) = 0 then 
Sig_match_node ptr_set_pkg.add(smnp, return_val); 
return; 
end if; 


feoLreden( (braien. SigMatchNedePtr), Sig Match node ptr seq pkg.scan, 


i 


(smnp.branches), 
processNode (branch) ; 
) 


end processNode; 


begin 
return_val := sig_match_node ptr_set_pkg.empty; 
foreach({(branch: SigMatchNodePtr), sig_match_node ptr _seq_pkg.scan, 
(the node.branches), 
processNode (branch); 
) 
PeCurneEccUrmae val, 
end getLeafNodePtrs; 


-~- Function: getValidLeafNodePtrs 
-- Description: collect the valid leaf nodes of the node into a set. 
-- Note the set will keep duplicates out. 
function getValidLeafNodePtrs (the node: in SigMatchNode) 
return SigMatchNodePtrSet is 
return val. sagMatchNodePurset; 


procedure processNode(smnp: in SigMatchNodePtr) is 


begin 
if sig match_node ptr_seq_pkg.length(smnp.branches) = 0 then 
if smnp.validation = VALID then 
sig_match_node_ ptr_set_pkg.add(smnp, return val); 
end if; 
return; 
end if; 


foreden(( branch: SigMatchNodePtr);7 sig match node ptr seq pkg.scan, 
(smnp.branches), 
processNode (branch) ; 
) 


end processNode; 


begin 
FeV =sstqumatecn Node Pplr Sct pkg. empty, 
foreach((branch: SigMatchNodePtr), sig_match_node ptr_seq pkg.scan, 
(the _node.branches), 
processNode (branch); 
) 
return return val; 
end getValidLeafNodePtrs; 


-- Function: validPairingExists 
-- Description: gets all the valid leaf nodes and checks if the pairing 
-- exists in any of them 
function validPairingExists(pairing: in OpMap; the_node: in SigMatchNode) 
return boolean is 
return_val: boolean; 
begin 
return val :— false; 
foreach((sn: SigMatchNodePtr), sig_match_node_ ptr_set_pkg.scan, 
(getValidLeafNodePtrs (the node)), 
JESNCE LeCurniva eenen 
return val := op _ map _pkg.submap(pairing, sn.V.OM); 
-- TODO: if return_val is true then should immediately return 
-- but for each doesn't let me do this 
end if; 
) 
return Feturn val, 
end validPairingExists; 


-- Procedure: sigMatchNodePtrSetPrint 


procedure sigMatchNodePtrSetPrint (the set: sigMatchNodePtrSet) is 
begin 
foreach((the_node: SigMatchNodePtr), sig _match_node ptr_set_pkg.scan, 
(the set), 
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SigMatechNodePrint (the node.all); 
news one; 
) 
end sigMatchNodePtrSetPrint; 


ena sig match types; 
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software base.ads 


emer ce ce em ec ce me cr em ew ewe we ws we ws es 


emcee ec cr cr ce wc ce ww crm mc cr ce ce cre cr cr cr we rt we cm ce cr ce ew we wee we ee 


with component _id_ types; use component_id types; 
with haase diagram; use haase diagram; 
with candidate types; use candidate types; 
with profile types; use profile types; 
package software base is 

procedure initialize (header filename: in string); 
function numComponents return natural; 

function numPartitions return natural; 

function numOccupiedPartitions return natural; 
procedure generateGML(gml filename: in string); 


function profileFilter (query filename: in string) return CandidateSet; 


FUNCETON SiGnaLureMatch (query filename: in) String; 
the candidate: in Candidate) return Candidate; 


FUNCtION GeLERrorilelD{p: Profile) return ProfilelD; 
EUNCLI One geebeeLite(p lid: )Proftilerp) return Profile; 
function getProfileIDs return profile lookup table pkg.res set; 


private 


-- the component _id map 


the _component_id map: ComponentIDMap; 


—— 


-- the haase_ diagram 


the haase diagram: HaaseDiagram; 


_— 


=> Eherprot tem lockupatasie 


the profile lookup table: ProfileLookupTable; 


end software base; 
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software_base.g 


meme wm wm me ww we we we ws se ee ee ee _ 


_—— ee a a i ee ie eee ee ee ee 


Eien text 10; use text 10; 
W2th ada.integer text 10; use ada.integer text_io; 


with a strings; 
wath psd! concrete type pkg; use psdl_concrete_type pkg; 


with component_id_ types; use component_id_ types; 
with haase diagram; use haase_ diagram; 

with candidate types; use candidate types; 

with profile types; use profile types; 

with psdl profile; use psdl_ profile; 

with sig match_types; use sig match_types; 

with profile filter pkg; 

with sig _ match; 


package body software base is 


—_—<— 


meee rOcedure: initialize 
-- Description: reads the header file to construct the_component_id map 
= and the_haase diagram. 
procedure initialize (header filename: in string) is 
ses aese rings; 


Header file; file type; 

comp_id: ComponentID; 

dir name: a_ string; 

gmput line; string(1..256); 
line_length: natural; 

comp_id last : natural; 

temp _comp profile: ComponentProfile; 
temp haase node: HaaseNode; 

temp component: Component; 
PMengenerics Maps .sGenerlcsMapset ; 
generics mapping: GenericsMap; 


id: natural := 0; 
old start: natural := 0; 
function new_id(start: natural) return natural is 


begin 
if start /= old_start then 
id := 0; 
elaystart. -= start, 
end 1£; 
ia 3= 1d 1; 


PCEURM Start + 1d; 
end new_id; 


begin 
-- parse header file and construct the_component_id_map 
component_id_map pkg.create(createComponent, the _component_id_map); 


open(header file, in_file, header filename) ; 

while (not end_of file(header file)) loop 
get Jline (header sii tejeinput laine, Tine length); 
Ject(@ipuc line, compsid;, comp id_last); 


-- trim spaces before and after directory name 
dir name := reverse order (trim( 

Eeverse Orderierimia Strings.to_a( 

imput linetecmpera, tast+l..line_length)))))); 


put ("preparing "); 


put (dir_name.s); 
ite as. )G 


Zo 


-- create a component for each generic_mapping 


the generics_maps := getGenericsMaps(convert(text(dir name & "/PSDL_SPEC"))); 
put (integer'image (generics map_set_pkg.size(the_generics maps) )); 
PuUeL Components ....2° ):; 


foreach((the map: GenericsMap), generics map set _pkg.scan, 


) 


(the generics maps), 
temp component := createComponent; 
temp component.psdl filename ;:= text(dir name & “/PSDL SPEC”); 
generics map pkg.assign(temp_component.generics mapping, the map); 
component_id_map_pkg.bind(new_id(comp_id), temp _component, 

the _component_id map); 


pucteline( done”), 
end loop; 
close (header _ file); 


~~ 


-- Create the ProfileLookupTable 


ehetproti le  lockupatapie  .— 
protile lookups taplenpkg create (PRVAULTSEROPILE ID); 


-~ construct haase diagram 


——< 


the haase_ diagram := createHaasSeDiagram; 


-- for each item in the component_id map, get the component's 
== profile and add at to the haase diagram 
foreach((the_comp_id: ComponentID; the_component: Component), 


component_id_map_pkg.scan, (the component id map), 


put(“inserting "); 
put (integer'image (the comp _id)); 


Pulte ees oe 


temp comp profile := getComponentProfile ( 


convert(the component.psdl filename), the component.generics_ mapping) ; 


check if haase node with temp_comp profile as its key 
already exists. If it does then add the component id 
to that node rather than make a new node. 


if haase_node_map_pkg.member (temp comp profile, the_haase diagram) then 


temp _haase_ node := haase_node map pkg.fetch(the_haase_ diagram, 
temp comp profile); 
else 
Cenp hadse mode — createnadsenode (temp icemp 3 p1ror1 le) ; 
end if; i 


addComponent (the comp_id, temp_haase_ node); 
addHaaseNode(temp_haase node, the haase diagram) ; 
PutHeiline: (doene™) ; 


) 


Dutt Profiles Lookup Table: 9); 
profileLookupTablePut (the_profile lookup table); 


new_line; 


put ("adding base nodes..."); 
addBaseNodes(the_ haase diagram); 
put _fine( done™): 
Putt connecting nodes... "); 
connectNodes (the haase diagram); 
put_line ("done"); 
end initialize; 


-- Function: numComponents 


-~ Description: return the number of components in the software base. 


function numComponents return natural is 
return _val: natural; 


begin 


return component_id_map_pkg.size(the_component _id_map); 
end numComponents; 
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-- Function: numPartitions 


—— Description: return the number of partitions in the software base. 
function numPartitions return natural is 
begin 
return haase node map _pkg.size(the_haase diagram) ; 
end numPartitions; 


-- Function: numOccupiedPartitions 
-~ Description: return the number of occupied partitions in the 
-- software base. 
function numOccupiedPartitions return natural is 
return val: natural := 0; 
begin 
foreach((the_key: ComponentProfile; the_hn: HaaseNode), 
haase node map _pkg.scan, (the _haase diagram), 
if component_id_set_pkg.size(the_hn.components) > 0 then 
Peturieval y= return. valo+ 1; 
end if; 
) 
Berwrin return vail; 
end numOccupiedPartitions; 


-- Function: generateGML 
procedure generateGML(gml_ filename: string) is 
begin 

generateGML(the haase_ diagram, gml_ filename); 
end generateGML; 


== Function: profileFilter 

--~ Description: performs profile filtering with the PSDL specified query 
a and returns an ordered set of candidates with the highest 
as profile ranking first. 

ie Note the PSDL query must NOT contain generics. 


function profileFilter(query filename: in string) return CandidateSet is 
muery profile: ComponentProfile; 
begin 
wueryeprotile *= gétComponentProfile(query filename, 
Fenerics Map pkg.create (empty) ) 7 
return profile filter pkg. findCandidates(query profile, the haase diagram); 
end profileFilter; 


S=srunction: signatureMatch 
~- Description: performs signature matching between the PSDL specified 
=e query and the candidate and returns a copy of the_candidate 
aia with the signature matches field set. 
function signatureMatch(query filename: in string; 
the candidate: in Candidate) return Candidate is 

q-ops, Cc ops: OpWithProfileSeq; 

sn: SigMatchNode; 

temp snp set: SigMatchNodePtrSet,; 

temp component: Component; 

return val: Candidate; 
begin 

-- get the query's operators 

q ops := getOpsWithProfiles(query filename, generics map _pkg.create(empty) ); 
new_line; 
Pubeline( Ouery: “); 


Pon 


opWithProfileSeqgPrint (q_ ops); 
new_line; 


-- get the candidate's operators 
temp_component := component_id map_pkg.fetch(the_component_id_map, 
the candidate= component 71d) 7 


Cc ops := getOpsWithProfiles(convert (temp _component.psdl filename), 
temp _component.generics mapping) ; 
put ("Candidates : -); 
put_line(integer'image(the_candidate.component_id)); 
put("Generics Mapping: "); 
genericsMapPut (temp _component.generics mapping); 
new_ line; 
new line; 
opWithProfileSegPrint (c_ops); 
new_ line; 


-- perform signature matching 

sn := createSigMatchNode; 

Sig match.sigMatchStatsReset; 

Sig match-maten ops(q eps, Cc _ ops, sn); 


-- calculate the signature ranks 
Sig match_node ptr set _pkg.assign(temp snp set, getLeafNodePtrs(sn)); 
foreach((smnp: SigMatchNodePtr), sig_match_node ptr_set_pkg.scan, 
(temp snp set), 
smnp.signature_ rank := float(op map pkg.size(smnp.V.OM)) / 
float fOwpesequence pkg. length (a ops) ) ; 
-- The following calculation for signature rank measures how well the 
-- signature matching method works on its own. The calculation above 
-- is really a mixture of profile filtering AND signature matching. 
== smnp.signature rank := float(op map pkg.size(smnp.V.OM)) / 
ie (return_val.profile rank * float(owp_sequence_pkg.length(q ops))); 
) 


-- add each SigMatchNodePtr to make sure return val's signature matches 
--~ field is sorted 
candidateAssign(return_val, the_candidate); 
foreach ( (Smnp —o1gMatenNoderer), “sig match node ptr Set pkg.scan, 
CEetiDpeship. Sek): 
sig_match_node_ ptr_set_pkg.add(smnp, return_val.signature_matches); 
) 


returm, Treturme val; 
end signatureMatch; 


== FUNCtion; GCrErProLllel Dp 

-- Description: if the profile doesn't exist then add it first then 
<= return its id. A new id is obtained from the global 
=< Variable unique profile id. 


unique profile id: ProfileID := 0; 


function getProfileID(p: Profile) return ProfileID is 
Feturmeva. se corilelp, 
begin 
return val := 
profile lookup_table pkg.fetch(the_ profile lookup table, p); 
if return_val = DEFAULT _PROFILE_ID then 
return val := unique profile wid, 
unique profile id := unique profile id + 1; 
pul ( binding: 3 
PprofilePut(p); 


put ce CLO en) : 
put (integer'image (return val)); 
DUG aye.) 
profile lookup _table pkg.bind(p, return_val, the profile lookup table); 
end if; 


return return _val; 
end getProfileID; 


No 


-- Function: getProfile 
function getProfile(p id: ProfileID) return Profile is 
reewiEneval- Frortile; 
begin 
Feturoevale.=— 07 
POuetone.ebrOri le, 1dseProriieip), profile lookup table pkg.scan, 
(the profile lookup table), a 
if ad)> p id then 
Beeheweva ls P, 
-~- TODO: should return here but for each doesn't let me 
end if; 
) 
return return_val; 
end getProfile; 


-- Function: getProfileIDs 


function getProfileIDs return profile lookup table pkg.res set is 
begin — 

hetUrne protile Jookup table pkg.map range(the profile lookup table); 
end getProfileIDs; 


end software base; 


fee 
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